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Novel proteins 

This invention relates to novel proteins, termed AAC74854.1, AAC76768.1, 
AAD05645.1, VP4 and R06O herein identified as adhesion molecules and to the use of 
these proteins and nucleic acid sequences from the encoding genes in the diagnosis, 
prevention and treatment of disease. 

All publications, patents and patent applications cited herein are incorporated in full by 
reference. 

BACKGROUND 

The process of drug discovery is presently undergoing a fundamental revolution as the 
era of functional genomics comes of age. The term "functional genomics" applies to an 
approach utilising bioinformatics tools to ascribe function to protein sequences of 
interest. Such tools are becoming increasingly necessary as the speed of generation of 
sequence data is rapidly outpacing the ability of research laboratories to assign functions 
to these protein sequences. 

As bioinformatics tools increase in potency and in accuracy, these tools are rapidly 
replacing the conventional techniques of biochemical characterisation. Indeed, the 
advanced bioinformatics tools used in identifying the present invention are now capable 
of outputting results in which a high degree of confidence can be placed. 

Various institutions and commercial organisations are examining sequence data as they 
become available and significant discoveries are being made on an on-going basis. 
However, there remains a continuing need to identify and characterise further genes and 
the polypeptides that they encode, as targets for research and for drug discovery. 

Recently, a remarkable tool for the evaluation of sequences of unknown function has 
been developed by the Applicant for the present invention. This tool is a database system, 
termed the Biopendium search database, that is the subject of co-pending United 
Kingdom Patent Application No. GB0006153.1. This database system consists of an 
integrated data resource created using proprietary technology and containing information 
generated from an all-by-all comparison of all available protein or nucleic acid 
sequences. 
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The aim behind the in.egra.ion of .hese sequence da.a from separate da.a resources . .o 
eomhine as mueh data as possible, relating bom to .he sequences themselves and to 
information re,evan. .0 each sequenee, into one tntegrateo resource. Ml .he avatlable da a 
bating .0 each sequence, inc.uding data on .he three-dimensiona! structure of .he 
, encoded protein, if this is availabie, are integrated together to make best use of the 
information .ha. is known abon, each sequence and thus to allow the most educated 
predtctions to be made from comparisons of these sequences. The annotation .ha. ts 
generated in the database and which accompanies each sequence entry imparts a 
biologically relevant context to the sequence information. 
0 This data resource has made possible the accurate prediction of protein function from 
sequence alone. Using conventional technology, this is only possible for proteins that 
exhibit a high degree of sequence homology (above about 20% homology) to other 
proteins in the same functional family. Accurate pactions are no. possible for protems 
that exhibit a very low degree of sequence homology to other related proteins of known 
15 function. 

b the presen. case, a protem whose sequence .s recorded in a publicly available database as 
AAC74854 1 (NCBI Genebank nucleotide accession number AE000273 and Genebank 
protein accession number AAC74854.1), is tmplicated as a nove! member of the adheston 
molecule family. 

» A second protein, termed AAC76768.1 (NCBI Genebaak nucleotide accesston number 
AE000451 and Genebank protein accession number AAC76768.1) is also implicated as a 
member of the adhesion molecule family. 

A third prctein renned AAD05645.1 (NCBI Genebank nucleotide accession number 
AE001445 arrd Genebank protein accession number AAD05645.1), is implicated as a nove, 
25 member of the adhesion molecule family. 

A fourth protein, termed VP4 (NCBI Genebank nucleotide accession number X82323 and 
Genebank protein accession number CAA57766.1) is also implicated as a member of the 
adhesion molecule family. 




A fifth protein, termed R06O (NCBI Genebank nucleotide accession number J04137 and 
SWISS-PROT protein accession number P10155) is also implicated as a member of the 
adhesion molecule family. 

Adhesion molecules are involved in a range of biological processes, including 
5 embryogenesis (Martin-Bermudo, M.D., et al.,. Development. 2000 Jun;127(12): 2607-15; 
Chen, L.M., et al, J Neurosci. 2000 May 15;20(10): 3776-84; Zweegman, S., et al, Exp 
Hematol. 2000 Apr;28(4): 401-10; Darribere, T., et al, Biol Cell. 2000 Jan;92(l): 5-25), 
maintenance of tissue integrity (Eckes, B., et al, J Cell Sci. 2000;113(Pt 13): 2455-2462; 
Buckwalter, J.A., et al, Instr Course Lect. 2000;49: 481-9; Frenette, P.S., et al, . J Exp Med. 

10 2000 Apr 17;191(8): 1413-22; Delmas, V., et al, Dev Biol. 1999 Dec 15;216(2): 491-506; 
Humphries, M.J., et al, Trends Pharmacol Sci. 2000 Jan;21(l): 29-32; Miosge, N., et al, 
Lab Invest. 1999 Dec;79(12): 1591-9), leukocyte extravasation/inflammation (Lim, L.H., et 
al Am J Respir Cell Mol Biol. 2000 Jun;22(6): 693-701; Johnston, B., et al, 
Microcirculation. 2000 Apr;7(2): 109-18; Mertens, A.V., et al, Clin Exp Allergy. 1993 

15 Oct;23(10): 868-73; Chcialowski, A., et al, Pol Merkuriusz Lek. 2000 Jan;7(43): 13-7; 
Rojas, A.I., et al, Crit Rev Oral Biol Med. 1999; 10(3): 337-58; Marinova-Mutafchieva, L., 
et al, Arthritis Rheum. 2000 Mar;43(3): 638-44; Vijayan, K.V., et al, J Clin Invest. 2000 
Mar;105(6): 793-802; Currie, A.J., et al,. J Immunol. 2000 Apr 1;164(7): 3878-86; Rowin, 
M.E., et al., Inflammation. 2000 Apr;24(2): 157-73; Johnston, B., et al, J Immunol. 2000 

20 Mar 15; 164(6): 3337-44; Gerst, J.L., et al, J Neurosci Res. 2000 Mar 1;59(5): 680-4; 
Kagawa, T.F., et al, Proc Natl Acad Sci USA. 2000 Feb 29;97(5): 2235-40; Hillan, K.J., et 
al, Liver. 1999 Dec;19(6): 509-18; Panes, J., 1999 Dec;22(10): 514-24; Arao, T., et al, J 
Clin Endocrinol Metab. 2000 Jan;85(l): 382-9; Souza, H.S., et al, Gut. 1999 Dec;45(6): 
856-63; Grunstein, M.M., et al, Am J Physiol Lung Cell Mol Physiol. 2000 Jun;278(6): 

25 LI 154-63; Mertens, A.V., et al, Clin Exp Allergy. 1993 Oct;23(10): 868-73; Berends, C, et 
al, Clin Exp Allergy. 1993 Nov;23(ll): 926-33; Fernvik, E, et al, Inflammation. 2000 
Feb;24(l): 73-87; Bocchino, V., et al, J Allergy Clin Immunol. 2000 Jan; 105(1 Pt 1): 65- 
70), oncogenesis (Orr, F.W., et al, Cancer. 2000 Jun;88(S12): 2912-2918; Zeller, W., et al, 
J Hematother Stem Cell Res. 1999 Oct;8(5): 539-46; Okada, T., et al, Clin Exp Metastasis. 

30 1999;17(7): 623-9; Mateo, V., et al, Nat Med. 1999 Nov;5(l 1): 1277-84; Yamaguchi, K., et 
al, J Exp Clin Cancer Res. 2000 Mar; 19(1): 113-20; Maeshima, Y., et al., J Biol Chem. 
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2M 0 fun 2 (epub ahead of reprinOl Van Waea, C, - a,, Int , W » ^"f R ^ 
95- Damlano, ,S„ - a,., Uuh 2000 WW 7.-8.; Setio , R.E 

• „ 1O0Q-1W3V 359-75; Shaw, L.M., J Mammary Olanfl Biol 
aI,Cancer MetasBsis Rev. 1999,18(3). 359 13. Mar6(3V 
Neop.aa.a- 1999 0*4(4): 367-76; Weyam, M ,, „ al, Clin Cancer Res. 2C« ( 
,49-56) and thrombogenes.s (Wang, Y.G., e, al, J Physio. (Loud). 2000 ,u. WW »■ 
6 ; Ma.nno, „., - * N.ppon Ya^rigatu Zassh, 2000 Mar ; n5,3); .43- * « 
B P , n, Cancer 3 Sc. Am. 2<*X> May,6 Supp, 3; S245-9; von Bed—, N «* ^; 

,;95(..); 3297-30.; Top,, EX, - - 3. 2«X> 3^.39(6); 927-33, 
Kroll H « at, Thromb Haemost. 2000 Mar;83(3): 392-6). 

THe deud.cn charac.erisa(ion of (he s(ruc(um and function of severs, adhesion-receptor 
Tics has .ed ro active programs hy a nnmWer of pharmaceutical comparues 
adhesion module anrago»,s, S for nse m rhe _, of —on, — - 
Avascular d.sease. Adhesion receptors are hrvolved rn vutirany every asp** of h oio^y 
ft om emhryogencsrs ro apop,os.s. They am essential ,0 rhe snuctirml ^ an 
Homeosratic mnctioning of mosr rissnes. It Is therefore not smpnsmg efe^ 
adhesron receptors canse d,sease and drat many drseases hrvo.ve modu.ati.on of 
molecule function. 

The Adhesion mo,=cn,e famUy in fact represents at .east fonr distinct ^ies which are 
Led hy their function rather than rheir suarcmre. O, rhe fonr famrhes, (hree am o, 
, pharmaceutical interest due to small molecule durability. They are; 

1Th e inregrin family is a superfamdy of a and B hererodimeric transmembrane 
^yeoprorehrs and U the famny, which has anmcted mosr pharmaceutic* m(em« I* 
m mhera are ,arge, heavdy g ,ycosy.a,ed, hemrodimenc pro.e.ns composed o on of a, 
.east .5 distinct a-subu».,s in non-covalent UnKage with one of a, .east 8 p-suburut, 
, Adhesion receptors bind Uganda erased on eel, surfaces, exuacehular maunr 
mo.ecn.es, and so.nble mo.ecn.es. fntegrrns am subcategory based on rhen p-subunr, 
usage The members of (his family are summarised below m Table 1. 
2 Selectins am a smau farnify of rhree members P, E and L se.ecnn. They am glycopmteins. 
Lovely expressed on ce„s re,a,ed ,o me vascu,amm, arrd contain a .ecnn-brndmg 
domain. The members of tins famrly are described below in Tab.e 2. 



3. The immunoglobulin family represents the counter receptor for the integrins and includes 
the intracellular adhesion molecules (ICAMs) and vascular cell adhesion molecules 
(VCAMs). Members are composed of variable numbers of globular, immunoglobulin- 
like, extracellular domains. Some members of the family, for example, PECAM-1 
(CD31) and NCAM, mediate homotypic adhesion. Some members of the family, for 
example ICAM-1 and VCAM-1, mediate adhesion via interactions with integrins. The 
members of this family are described below in Table 3. 

Adhesion molecules have been shown to play a role in diverse physiological functions, 
many of which can play a role in disease processes. Alteration of their activity is a means to 
alter the disease phenotype and as such identification of novel adhesion molecules is highly 
relevant as they may play a role in many diseases, particularly inflammatory disease, 
oncology, and cardiovascular disease. 



Table 1. Integrins: 

Integrin Ligand Distribution 

Receptor 




i . .. ' ■ . ; • mm „™^._i. ia^^^yaau; m \ - m % ^ — 

al (3 1 Laminin, Collagen Activated T cells, fibroblasts 



cx2pi Collagen, Laminin Activated T cells, endothelial cells, platelets, 

basophils. 

a3pl Laminin, Collagen, Basement membrane 

Fibronectin 

a4pl VCAM-1 (domains 1 and Lymphocytes, monocytes, eosinophils, 

4), Fibronectin (CS- 1 basophils, mast cells, NK cells 
domain), MadCAM-1 

a5pl Fibronectin Lymphocytes, monocytes, endothelial cells, 

basophils, 

mast cells, fibroblasts 

a6pl Laminin Platelets, T cells, eosinophils, monocytes, 




Integrin 
Receptor 



Ligand Distribution 



endothelial cells 

Tenascin, VCAM-1, Airway epithelial cells, smooth muscle cells, 
Osteopontin neutrophils 
avpi Vitronectin, fibronectin 



a9pi 



62 (CDi8) . ' y ; 

LFA-T ICAMXIT" All leukocytes 

(CDlla/CD18) 



Mac-1 



ICAM-1, Fibrinogen, LPS Granulocytes, monocytes 



(CDllb/CD18 

oD ICAM-3, VCAM-1 Tissue macrophages, monocytes, CD8+ T 

cells, eosinophils 



IgiHif^ endothelial cells 



GpHb/IIIa Fibrinogen, 

Fibronectin, vWF 
aV/ffla Vitronectin, Fibrinogen,Platelets, 

V WF, Laminin, 

Thrombospondin, 

Osteopontin 



B7 r ~7~~~~ ■ " " ' . L— ■ r.-,; 

, 4p7 M AdC A M- 1 7 " VC A~M-T Subset" of memory T cells, eosmoplnls, 

Firbonectin (CS-1 domain) basophils, endothelial cells 

(LPAM-1) 

aEP 7 E-cadherin Intestinal intraepithelial lymphocytes. 
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Table 2. Selectins: 
Receptor Ligand 



Distribution 



E-selectin 



L-selectin 



P-selectin 



Sialyl-LewisX, L- Activated endothelial cells 

selectin, LFA-1, ESL-1, 

PSGL-1 

GlyC AM- 1 , M AdC AM- Resting leukocytes 
1, CD34, Sialyl LewisX, 
E-selectin, P-selectin 

Sialyl-LewisX, L- Activated endothelial cells, activated platelets 

selectin, PSGL-1 



Table 3. Immunoglobulin superfamily: 

Receptor Ligand Distribution 



ICAM-2 



LFA-1 (CDllb) 



ICAM-1 LFA-1 (CD 11 a/CD 18) Widespread, endothelial cells, fibroblasts, 

5 Ig domains Mac-1 (CD 1 lb/CD 18), e P ithelium ' lymphocytes, dendritic 

cells, chondrocytes. 

endothelial cells (high): lymphocytes, 
monocytes, basophils, platelets (low). 

Lymphocytes, monocytes, neutrophils, 
eosinophils, basophils. 

Endothelial cells, monocytes, fibroblasts, 
dendritic cells, bone marrow stromal cells, 
myoblasts. 

Endothelial cells, leukocytes, epithelial cells 



2 Ig domains 

ICAM-3 LFA-1 (ad/CD 18) 



5 Ig domains 

VCAM-1 a4pl,a4p7 

6 or 7 Ig 
domains 

LFA-3 CD2 



6 Ig domains 

PEC AM- 1 CD3 1 , heparin 



Endothelial cells (at EC-EC junctions), T cell 
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(CD31) 

NCAM NCAM, heparin S0 4 

MadC AM- 1 a4(37, L-selectin 

4 Ig domains 

CD2 CD58, CD59, CD48 



subsets, platelets, neutrophils, eosinophils, 
monocytes, smooth muscle cells, bone marrow 
stem cells. 
Neural cells, muscle 

Peyer's patch, mesenteric lymph nodes, 
mucosal endothelial cells, spleen. 
T lymphocytes 



THE INVENTION AA ™ 7fi «i 

♦w thp A.AC74854.1 protein, AAC76/b«.i 

, Z?ZX~ - - — ■ — 

molecules. 

fc a « aspect, the invention provides a poiypepttde, which po.ypep.de: 
(i) has the antino acid science as recheo in SBQ K> NO: 2. SEQ H, NO: 4, SEQ ID 
NO" 6 SEQ ID NO: 8 or SEQ ID NO: 10; 

determinant in common with the polypeptides of (i); or 

(iii) is a functional equivalent of (i) or (ii). 

a ;„ CPD TD NO- 2 is referred to hereafter as 
The polypeptide having the sequence recited in SEQ ID NO. 

15 is referred to hereafter as the ADS2 polyp P polype ptide" . The 

a in SEO ID NO- 6 is referred to hereafter as the ADbi po yp V 
recited in SEQ ID NO ^ ag „ the 

polypeptide having the sequence recited in SEQ ID NO 

ADS4 polypeptide". The polypeptide having the sequence recited in SEQ ID NO. 

referred to hereafter as "the ADS 5 polypeptide". 

20 „ a second aspect, the — ptovtdes a po^ed • ~ 

. , f ir<:t asnect of the invention. freieraDiy, uk, f 

encodes a polypeptide of the first aspeci oi 
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acid molecule has the nucleic acid sequence as recited in SEQ ID NO: 1 (encoding the 
ADS1 polypeptide), SEQ ID NO: 3 (encoding the ADS2 polypeptide), SEQ ID NO: 5 
(encoding the ADS 3 polypeptide) SEQ ID NO: 7 (encoding the ADS4 polypeptide), SEQ 
ID NO: 9 (encoding the ADS5 polypeptide) or is a redundant equivalent or fragment of 
5 either of these sequences. 

In a third aspect, the invention provides a purified nucleic acid molecule, which 
hybridizes under high stringency conditions with a nucleic acid molecule of the second 
aspect of the invention. 

In a fourth aspect, the invention provides a vector, such as an expression vector, that 
10 contains a nucleic acid molecule of the second or third aspect of the invention. 

In a fifth aspect, the invention provides a host cell transformed with a vector of the fourth 
aspect of the invention. 

In a sixth aspect, the invention provides a ligand which binds specifically to, and which 
preferably inhibits the adhesion molecule activity of, a polypeptide of the first aspect of 
15 the invention. 

In a seventh aspect, the invention provides a compound that is effective to alter the 
expression of a natural gene which encodes a polypeptide of the first aspect of the 
invention or to regulate the activity of a polypeptide of the first aspect of the invention. 

A compound of the seventh aspect of the invention may either increase (agonise) or 
20 decrease (antagonise) the level of expression of the gene or the activity of the 
polypeptide. Importantly, the identification of the function of the ADS1, ADS2, ADS3, 
ADS4 and ADS5 polypeptides allows for the design of screening methods capable of 
identifying compounds that are effective in the treatment and/or diagnosis of disease. 

In an eighth aspect, the invention provides a polypeptide of the first aspect of the 
25 invention, or a nucleic acid molecule of the second or third aspect of the invention, or a 
vector of the fourth aspect of the invention, or a ligand of the fifth aspect of the invention, 
or a compound of the sixth aspect of the invention, for use in therapy or diagnosis. These 
molecules may also be used in the manufacture of a medicament for the treatment of 
cardiovascular disease, cancer, asthma, COPD, inflammatory disease or bacterial 
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infections. 

b a ninth aspect, the invention provides a method of dragnosmg a d,ease in a patient 
cotnpnsing assesstng the leve! of expression of a natura! gene encoding a polypepUde o 
th e firs, aspect of the tnvention or the activity of a poiypept.de of the first aspect o the 
invention in tissue front said patient and conning sard leve, of expression or activrty to 
a control level, wherein a .eve! tha, is different to sard control level is mdtcative of 
diS ease. Such a method will preferably he can-red on, in vino. Sinnlar methods may he 
us ed for monitoring ,he ,herapen,ic treatment of disease in a patient, whetem al.enng the 
,eve, of expression or activrty of a polypepUde or nncle.c acid molecu,e over <he penod 
, of rime towards a control level is indicative of regression of dtsease. 

The adheston molecules whose sequences are presented in SEQ ID NO: 2 and SEQ K> 
N0 - 4 are tmplicated herein in the pathogenicity of the organism EsCencH.a CoU. 
Accordingly, tigands to these prote.ns are Utely to he effective in controlling dtsease 
caused hy this organism. Furthermore, these proteins provide a potential component for 
5 vaccine against this organism and rhe diseases that it causes. 

The adhesron molecule whose sequence is presented tn SEQ IB NO: 6 is tmplicated 
herein m the pathogenicity o, the orgamsm Helper Pylori. Accord.ngly, Uganda to 
rhis proton are Italy to he effective in controUing disease caused hy thts orgamsm 
Furthermore, mis protein provides a potentia. component for a vaccine agarnst thts 
20 organism and the diseases that it causes. 

The adhesion molecule whose sequence is presented in SEQ U> NO: 8 is implicated 
herein tn the pathogenicity of rhe organism H«nan Ro,<Mrus. Accordtngly, hgands to 
thls protein are I*e,y to he effective in controUing disease caused hy thts orgamsm. 
Furthermore, this protein provides a potential component for a vaccine agarnst tins 
25 organism and the diseases that it causes. 

A preferred method for detecting polypeptides of the firs, aspect of the invention 
comprises the steps of: (a, contacting a Ugand, such as an anUhody, of me sixth aspect o 
the invention with a bio.ogical sample under conditions smtahle for the formation of 
hgand-polypeptide complex; and (b) detecting said complex. 




11 



A number of different such methods according to the ninth aspect of the invention exist, 
as the skilled reader will be aware, such as methods of nucleic acid hybridization with 
short probes, point mutation analysis, polymerase chain reaction (PCR) amplification and 
methods using antibodies to detect aberrant protein levels. Similar methods may be used 
5 on a short or long-term basis to allow therapeutic treatment of a disease to be monitored 
in a patient. The invention also provides kits that are useful in these methods for 
diagnosing disease. 

In a tenth aspect, the invention provides for the use of a polypeptide of the first aspect of 
the invention as an adhesion molecule. 

10 In an eleventh aspect, the invention provides a pharmaceutical composition comprising a 
polypeptide of the first aspect of the invention, or a nucleic acid molecule of the second 
or third aspect of the invention, or a vector of the fourth aspect of the invention, or a 
ligand of the sixth aspect of the invention, or a compound of the seventh aspect of the 
invention, in conjunction with a pharmaceutically-acceptable carrier. 

15 In a twelfth aspect, the present invention provides a polypeptide of the first aspect of the 
invention, or a nucleic acid molecule of the second or third aspect of the invention, or a 
vector of the fourth aspect of the invention, or a ligand of the sixth aspect of the 
invention, or a compound of the seventh aspect of the invention, for use in the 
manufacture of a medicament for the diagnosis or treatment of a disease, such as herpes 

20 virus infection. 

In a thirteenth aspect, the invention provides a method of treating a disease in a patient 
comprising administering to the patient a polypeptide of the first aspect of the invention, 
or a nucleic acid molecule of the second or third aspect of the invention, or a vector of the 
fourth aspect of the invention, or a ligand of the sixth aspect of the invention, or a 
25 compound of the seventh aspect of the invention. 

For diseases in which the expression of a natural gene encoding a polypeptide of the first 
aspect of the invention, or in which the activity of a polypeptide of the first aspect of the 
invention, is lower in a diseased patient when compared to the level of expression or 
activity in a healthy patient, the polypeptide, nucleic acid molecule, ligand or compound 
30 administered to the patient should be an agonist. Conversely, for diseases in which the 
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expression of the natural gene or activ.ty of fhe polypepf.de is higher in a diseased pauen, 

nncleic acid molecu,e, Hgand or compound adm.nis.ered ,o fhe pauen, shouid h 
an.agon.sr. Examples of such anfagonis, inciude anusense nucieic ae.d modules, 
ribozymes and ligands, such as antibodies. 

h a fourteenth aspec, fhe i„ve„.io„ provides .ransgenic or knockou, non-human animals 
tha , have heen transformed <o express higher, iower or ahsen, levels of a poiypep de 
1 firs, aspec. of .he invent. Such transgenic animais are very use*., mode!s or .he 
of lase aud may aiso he using in screening regimes for .he identification of 

compounds tha, are effective in tine ueatment or diagnosis of such a d.sease. 

A sununary of standard techniques and procedures, wh.ch may he employed in order to 

lie invention, is given below. It win he understood ,a. .his invenuon .s no. 

limittd ,o me particu,ar me.hodoiogy, protocols, cell lines, vectors and reagents 
lied. is L ro be understood tha. the —y used herein is for .he purpose 

describing particniar emhodiments on,y and it is no, intended .ha, mis — gy 

should Umi, me scope of .he presen, ,nve„„on. The ex,en, of tine invenuon .s hmned only 

by the terms of the appended claims. 

Standard abbreviations for nucleotides and amino acids are used in tinis specification. 
Th e practice o, tine presen, invention wifi employ, unless otherwise indie** 
, convention* ,echn, q ues of molecuian biology, microb.ology, recomb,nan, 
technology and immunology, which are wi.h.u the skill of tinose working m tine art. 
Such .eohniuues are explained fully in tine literature. Examples of particular!, suitable 
el 1— include tine fohowing; Samhrook Modular Cloning; A I— y 
Mauua,, Second Edition (1989); DNA Coning, Vo.umes , aud n (D.N Glover ed. 1985), 
, Ohgonuoleotide Synthesis (MX Oa„ ed. 1984); Nuclele Acid Hy— 0£ 
Hames ft S J. Higgins eds. 1984); Transcription and Translation (B.D. Hames ft SX 

Enfynnes (BL Press, 1986); B. Perba,, A Practica! G„,de to Mo.ecu an Clonmg ( 4, 
.helhods in En.ymo.ogy series (Academrc Press, Inc.,, especal y voume 1 ft 
30 Gene Transfer Vectors for Mammalian Ce„s (,.H. Miller and M.P. Calos eds. 1987, 
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Cold Spring Harbor Laboratory); Immunochemical Methods in Cell and Molecular 
Biology (Mayer and Walker, eds. 1987, Academic Press, London); Scopes, (1987) 
Protein Purification: Principles and Practice, Second Edition (Springer Verlag, N.Y.); and 
Handbook of Experimental Immunology, Volumes I-IV (D.M. Weir and C. C. Blackwell 
5 eds. 1986). 

As used herein, the term "polypeptide" includes any peptide or protein comprising two or 
more amino acids joined to each other by peptide bonds or modified peptide bonds, i.e. 
peptide isosteres. This term refers both to short chains (peptides and oligopeptides) and to 
longer chains (proteins). 

10 The polypeptide of the present invention may be in the form of a mature protein or may 
be a pre-, pro- or prepro- protein that can be activated by cleavage of the pre-, pro- or 
prepro- portion to produce an active mature polypeptide. In such polypeptides, the pre-, 
pro- or prepro- sequence may be a leader or secretory sequence or may be a sequence that 
is employed for purification of the mature polypeptide sequence. 

15 The polypeptide of the first aspect of the invention may form part of a fusion protein. For 
example, it is often advantageous to include one or more additional amino acid sequences 
which may contain secretory or leader sequences, pro-sequences, sequences which aid in 
purification, or sequences that confer higher protein stability, for example during 
recombinant production. Alternatively or additionally, the mature polypeptide may be 

20 fused with another compound, such as a compound to increase the half-life of the 
polypeptide (for example, polyethylene glycol). 

Polypeptides may contain amino acids other than the 20 gene-encoded amino acids, 
modified either by natural processes, such as by post-translational processing or by 
chemical modification techniques which are well known in the art. Among the known 

25 modifications which may commonly be present in polypeptides of the present invention 
are glycosylation, lipid attachment, sulphation, gamma-carboxylation, for instance of 
glutamic acid residues, hydroxylation and ADSP-ribosylation. Other potential 
modifications include acetylation, acylation, amidation, covalent attachment of flavin, 
covalent attachment of a haeme moiety, covalent attachment of a nucleotide or nucleotide 

30 derivative, covalent attachment of a lipid derivative, covalent attachment of 
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phosphatidylinositol, cross-linking, cyclization, disulphide bond formation, 
demethylation, formation of covalent cross-links, formation of cysteine, formation of 
pyroglutamate, formylation, GPI anchor formation, iodination, mediation, 
myristoylation, oxidation, proteolytic processing, phosphorylation, prenylation, 
racemization, selenoylation, transfer-RNA mediated addition of amino acids to protems 
such as arginylation, and ubiquitination. 

Modifications can occur anywhere in a polypeptide, including the peptide backbone, the 
amino acid side-chains and the amino or carboxyl termini. In fact, blockage of the amino 
or carboxyl terminus in a polypeptide, or both, by a covalent modification is common m 
naturally occurring and synthetic polypeptides and such modifications may be present m 
polypeptides of the present invention. 

The modifications that occur in a polypeptide often will be a function of how the 
polypeptide is made. For polypeptides that are made recombinantly, the nature and extent 
of the modifications in large part will be determined by the post-translational 
modification capacity of the particular host cell and the modification signals that are 
present in the amino acid sequence of the polypeptide in question. For mstance, 
glycosylate patterns vary between different types of host cell. 

The polypeptides of the present invention can be prepared in any suitable manner. Such 
polypeptides include isolated naturally occurring polypeptides (for example purified from 
cell culture), recombinantly-produced polypeptides (including fusion protems), 
synthetically-produced polypeptides or polypeptides that are produced by a combination 
of these methods. 

The functionally equivalent polypeptides of the first aspect of the invention may be 
polypeptides that are homologous to the ADS1, ADS2, ADS 3, ADS4 or ADS5 
polypeptides. Two polypeptides are said to be "homologous", as the term is used herein, 
if the sequence of one of the polypeptides has a high enough degree of identity or 
similarity to the sequence of the other polypeptide. "Identity" indicates that at any 
particular position in the aligned sequences, the amino acid residue is identical between 
the sequences. "Similarity" indicates that, at any particular position in the aligned 
) sequences, the amino acid residue is of a similar type between the sequences. Degrees of 
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etc. Monoclonal antibodies are particularly ust 
polypeptides against which they are directed 
monoclonal antibodies of interest may be isolated 
techniques known in the art, and cloned and expre: 

5 Chimeric antibodies, in which non-human variabl 
constant regions (see, for example, Liu et al., I 
(1987)), may also be of use. 

The antibody may be modified to make it less imn 
by humanisation (see Jones et al., Nature, 321, t 

10 239, 1534 (1988); Kabat et al., J. Immunol., 147 ; 
Acad. Sci. USA, 86, 10029 (1989); Gorman et al., 
(1991); and Hodgson et al., Bio/Technology, 9 
antibody ", as used herein, refers to antibody mol 
and selected other amino acids in the variable dom 

15 a non-human donor antibody have been substitutec 
in a human antibody. The humanised antibody thi 
but has the binding ability of the donor antibody. 

In a further alternative, the antibody may fee a ? "b 
having two different antigen-binding domains, < 
20 different epitope. 

. Phage display technology may be utilised to sele< 
binding activities towards the polypeptides of th 
PCR amplified V-genes of lymphocytes from 
relevant antibodies, or from naive libraries (Mc( 
25 552-554; Marks, J. et al., (1992) Biotechnology 
antibodies can also be improved by chain shuffli 
352,624-628). ' 

Antibodies generated by the above techniques, w: 
additional utility in that they may be empl< 
30 radioimmunoassays (RIA) or enzyme-linked imr 



THIS PAGE BLANK (uspto) 



15 



identity and similarity can be readily calculated (Computational Molecular Biology, 
Lesk, A.M., ed., Oxford University Press, New York, 1988; Biocomputing. Informatics 
and Genome Projects, Smith, D.W., ed., Academic Press, New York, 1993; Computer 
Analysis of Sequence Data, Part 1, Griffin, A.M., and Griffin, H.G., eds., Humana Press, 
5 New Jersey, 1994; Sequence Analysis in Molecular Biology, von Heinje, G., Academic 
Press, 1987; and Sequence Analysis Primer, Gribskov, M. and Devereux, J., eds., M 
Stockton Press, New York, 1991). 

Homologous polypeptides therefore include natural biological variants (for example, 
allelic variants or geographical variations within the species from which the polypeptides 

10 are derived) and mutants (such as mutants containing amino acid substitutions, insertions 
or deletions) of the ADS1, ADS2, ADS 3, ADS4 or ADS5 polypeptides. Such mutants 
may include polypeptides in which one or more of the amino acid residues are substituted 
with a conserved or non-conserved amino acid residue (preferably a conserved amino 
acid residue) and such substituted amino acid residue may or may not be one encoded by 

15 the genetic code. Typical such substitutions are among Ala, Val, Leu and He; among Ser 
and Thr; among the acidic residues Asp and Glu; among Asn and Gin; among the basic 
residues Lys and Arg; or among the aromatic residues Phe and Tyr. Particularly preferred 
are variants in which several, i.e. between 5 and 10, 1 and 5j l and 3, 1 and 2 or just 1 
amino acids are substituted, deleted or added in any combination. Especially preferred are 

20 silent substitutions, additions and deletions, which do not alter the properties and 
activities of the protein. Also especially preferred in this regard are conservative 
substitutions. 

Such mutants also include polypeptides in which one or more of the amino acid residues 
include a substituent group; 

25 Typically, greater than 50% identity between two polypeptides is considered to be an 
indication of functional equivalence. Preferably, functionally equivalent polypeptides of 
the first aspect of the invention have a degree of sequence identity with the ADS1, ADS2, 
ADS 3, ADS4 or ADS5 polypeptide or with active fragments thereof, of greater than 
50%. More preferred polypeptides have degrees of identity of greater than 60%, 70%, 

30 80%, 90%, 95%, 98% or 99%, respectively. 
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The polypeptides of the present invention or their immunogenic fragments (comprising at 
least one antigenic determinant) can be used to generate ligands, such as polyclonal or 
monoclonal antibodies, that are immunospecific for the polypeptides. Such antibodies 
may be employed to isolate or to identify clones expressing the polypeptides of the 
5 invention or to purify the polypeptides by affinity chromatography. The antibodies may 
also be employed as diagnostic or therapeutic aids, amongst other applications, as will be 
apparent to the skilled reader. 

The term "immunospecific" means that the antibodies have substantially greater affinity 
for the polypeptides of the invention than their affinity for other related polypeptides in 
10 the prior art. As used herein, the term "antibody" refers to intact molecules as well as to 
fragments thereof, such as Fab, F(ab')2 and Fv, which are capable of binding to the 
antigenic determinant in question. Such antibodies thus bind to the polypeptides of the 
first aspect of the invention. 

If polyclonal antibodies are desired, a selected mammal, such as a mouse, rabbit, goat or 
15 horse, may be immunised with a polypeptide of the first aspect of the invention. The 
polypeptide used to immunise the animal can be derived by recombinant DNA 
technology or can be synthesized chemically. If desired, the polypeptide can be 
conjugated to a carrier protein. Commonly used carriers to which the polypeptides may 
be chemically coupled include bovine serum albumin, thyroglobulin and keyhole limpet 
20 haemocyanin. The coupled polypeptide is then used to immunise the animal. Serum from 
the immunised animal is collected and treated according to known procedures, for 
example by immunoaffinity chromatography. 

Monoclonal antibodies to the polypeptides of the first aspect of the invention can also be 
readily produced by one skilled in the art. The general methodology for making 
25 monoclonal antibodies using hybridoma technology is well known (see, for example, 
Kohler, G. and Milstein, C, Nature 256: 495-497 (1975); Kozbor et al., Immunology 
Today 4: 72 (1983); Cole et al., 77-96 in Monoclonal Antibodies and Cancer Therapy, 
Alan R. Liss, Inc. (1985). 

Panels of monoclonal antibodies produced against the polypeptides of the first aspect of 
30 the invention can be screened for various properties, i.e., for isotype, epitope, affinity, 
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The polypeptides of Che present tnvention or their immunogenic fragments (eompnsmg a, 
ieast one antigenic determinant) can be nsed ,0 generate hgands, snch as po.yclonal or 
monoclonal antibodies, that are monospecific for the polypeptides. Snch antibod.es 
may be employed to isolate or to identify clones expressing ,he polypeptides of the 
invention or to pnrify the polypeptides by affinity chromatography. The antibod.es may 
also be employed as diagnostic or therapentic aids, amongst other applications, as wdl be 
apparent to the skilled reader. 

The term ".mmunospecific" means that the antibodies have snbs.ant.ally greater affinity 
for the polypeptides of the invention than their affinity for other related polypeptides ,n 
the prior art. As nsed herein, the term -antibody" refers to intact molecnles as well as to 
fragments thereof, snch as Fab, F(ab')2 and Fv, which are capable of bindmg to the 
antigenic determinant in question. Such antibodies thus bind to the polypeptides of the 
first aspect of the invention. 

If polyclonal antibodies are desired, a selected mammal, such as a mouse, rabbit, goat or 
horse may be immunised with a polypeptide of the first aspect of the invention. The 
polypeptide used to immunise the animal can be derived by recombinant DNA 
technology or can be synthesized chemtcally. If desired, .he polypeptide can be 
conjugated to a earner protein. Commonly used carriers to which the polypeptides may 
be chemically coupled include bovine serum albumin, thyroglobulin and keyhole hmpet 
haemocyanin. The coupled polypeptide is then used to immun.se the animal. Serum from 
the immunised animal is collected and treated according to known procedures, for 
example by immunoaffinity chromatography. 

Monoclonal antibodies to the polypeptides of tire firs, aspect of the invention can also be 
readily produced by one skilled in tire art. The general methodology for makrng 
monoclonal antibodies using hybridoma technology is well known (see, for example, 
Kohler G and MUstein, C, Nature 256: 495-497 (1975); Kozbor e. al., Immunology 
Today. 4. 72 (1983); Cole e, al.. 77-96 .n Monoclonal Antibodies and Cancer Therapy, 
Alan R. Liss, Inc. (1985). 

Panels of monoclonal antibodies produced against the polypeptides of the firs, aspect of 
0 the invention can be screened for various properties, i.e., for iso,yp«, epitope, affimty. 
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etc. Monoclonal antibodies are particularly useful in purification of the individual 
polypeptides against which they are directed. Alternatively, genes encoding the 
monoclonal antibodies of interest may be isolated from hybridomas, for instance by PCR 
techniques known in the art, and cloned and expressed in appropriate vectors. 

5 Chimeric antibodies, in which non-human variable regions are joined or fused to human 
constant regions (see, for example, Liu et aL, Proc. Natl. Acad. Sci. USA, 84, 3439 
(1987)), may also be of use. 

The antibody may be modified to make it less immunogenic in an individual, for example 
by humanisation (see Jones et al., Nature, 321, 522 (1986); Verhoeyen et al., Science, 

10 239, 1534 (1988); Kabat et al., J. Immunol., 147, 1709 (1991); Queen et al., Proc. Natl 
Acad. Sci. USA, 86, 10029 (1989); Gorman et al., Proc. Natl Acad. Sci. USA, 88, 34181 
(1991); and Hodgson et al., Bio/Technology, 9, 421 (1991)). The term "humanised 
antibody", as used herein, refers to antibody molecules in which the CDR amino acids 
and selected other amino acids in the variable domains of the heavy and/or light chains of 

15 a non-human donor antibody have been substituted in place of the equivalent amino acids 
in a human antibody. The humanised antibody thus closely resembles a human antibody 
but has the binding ability of the donor antibody. 

In a further alternative, the antibody may be a "bispecific" antibody, that is an antibody 
having two different antigen-binding domains, each domain being directed against a 
20 different epitope. 

Phage display technology may be utilised to select genes which encode antibodies with 
binding activities towards the polypeptides of the invention either from repertoires of 
PCR amplified V-genes of lymphocytes from humans screened for possessing the 
relevant antibodies, or from naive libraries (McCafferty, J. et al., (1990),. Nature 348, 
25 552-554; Marks, J. et al., (1992) Biotechnology 10, 779-783). The affinity of these 
antibodies can also be improved by chain shuffling (Clackson, T. et al., (1991) Nature 
352, 624-628). 

Antibodies generated by the above techniques, whether polyclonal or monoclonal, have 
additional utility in that they may be employed as reagents in immunoassays, 
30 radioimmunoassays (RIA) or enzyme-linked immunosorbent assays (ELISA). In these 



c 



19 



applications, the antibodies can be labelled with an analytically detectable reagent such as 
a radioisotope, a fluorescent molecule or an enzyme. 

Preferred nucleic acid molecules of the second and third aspects of the invention are 
those which encode the polypeptide sequences recited in SEQ ID NO: 2, SEQ ID NO: 4, 

5 SEQ ID NO: 6, SEQ ID NO: 8 and SEQ ID NO: 10 , and functionally equivalent 
polypeptides. These nucleic acid molecules may be used in the methods and applications 
described herein. The nucleic acid molecules of the invention preferably comprise at least 
n consecutive nucleotides from the sequences disclosed herein where, depending on the 
particular sequence, n is 10 or more (for example, 12, 14, 15, 18, 20, 25, 30, 35, 40 or 

10 more). 

The nucleic acid molecules of the invention also include sequences that are 
complementary to nucleic acid molecules described above (for example, for antisense or 
probing purposes). 

Nucleic acid molecules of the present invention may be in the form of RNA, such as 
mRNA, or in the form of DNA, including, for instance cDNA, synthetic DNA or 
genomic DNA. Such nucleic acid molecules may be obtained by cloning, by chemical 
synthetic techniques or by a combination thereof. The nucleic acid molecules can be 
prepared, for example, by chemical synthesis using techniques such as solid phase 
phosphoramidite chemical synthesis, from genomic or cDNA libraries or by separation 
20 from an organism. RNA molecules may generally be generated by the in vitro or in vivo 
transcription of DNA sequences. 

The nucleic acid molecules may be double-stranded or single-stranded. Single-stranded 
DNA may be the coding strand, also known as the sense strand, or it may be the non- 
coding strand, also referred to as the anti-sense strand. 

The term "nucleic acid molecule" also includes analogues of DNA and RNA, such as 
those containing modified backbones, and peptide nucleic acids (PNA). The term "PNA", 
as used herein refers to an antisense molecule or an anti-gene agent that comprises an 
oligonucleotide of at least five nucleotides in length linked to a peptide backbone of 
amino acid residues, which preferably ends in lysine. The terminal lysine confers 
solubility to the composition. PNAs may be pegylated to extend their lifespan in a cell, 
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where they preferentially bind complementary single stranded DNA and RNA and stop 
transcript elongation (Nielsen, P.E. et al. (1993) Anticancer Drug Des. 8: 53-63). 

A nucleic acid molecule which encodes the polypeptide of SEQ ID NO: 2 may be 
identical to the coding sequence of the nucleic acid molecule shown in SEQ ID NO: 1. A 
5 nucleic acid molecule which encodes the polypeptide of SEQ ID NO: 4 may be identical 
to the coding sequence of the nucleic acid molecule shown in SEQ ID NO: 3. . A nucleic 
acid molecule which encodes the polypeptide of SEQ ID NO: 6 may be identical to the 
coding sequence of the nucleic acid molecule shown in SEQ ID NO: 5. A nucleic acid 
molecule which encodes the polypeptide of SEQ ID NO: 8 may be identical to the coding 
10 sequence of the nucleic acid molecule shown in SEQ ID NO: 7. A nucleic acid molecule 
which encodes the polypeptide of SEQ ID NO: 10 may be identical to the coding 
sequence of the nucleic acid molecule shown in SEQ ID NO: 9. 

These molecules also may have a different sequence which, as a result of the degeneracy 
of the genetic code, encodes a polypeptide of SEQ ID NO: 2, SEQ ID NO: 4, SEQ ID 

15 NO: 6, SEQ ID NO: 8 or SEQ ID NO: 10. Such nucleic acid molecules that encode the 
polypeptide of SEQ ID NO: 2, SEQ ID NO: 4, SEQ ID NO: 6, SEQ ID NO: 8 or SEQ ID 
NO: 10 may include, but are not limited to, the coding sequence for the mature 
polypeptide by itself; the coding sequence for the mature polypeptide and additional 
coding sequences, such as those encoding a leader or secretory sequence, such as a pro-, 

20 pre- or prepro- polypeptide sequence; the coding sequence of the mature polypeptide, 
with or without the aforementioned additional coding sequences, together with further 
additional, non-coding sequences, including non-coding 5' and 3' sequences, such as the 
transcribed, non-translated sequences that play a role in transcription (including 
termination signals), ribosome binding and mRNA stability. The nucleic acid molecules 

25 may also include additional sequences, which encode additional amino acids, such as 
those, which provide additional functionalities. 

The nucleic acid molecules of the second and third aspects of the invention may also 
encode the fragments or the functional equivalents of the polypeptides and fragments of 
the first aspect of the invention. Such a nucleic acid molecule may be a naturally 
30 occurring variant such as a naturally occurring allelic variant, or the molecule may be a 
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variant .ha, is no. known to occur nan,*. Snch non-namraUy occurring varianrs of £ 
uucleic acid mo.ecu.e may be naade by muragenesrs techniques, inc.udmg .hose apphed 
to nucleic acid molecules, cells or organisms. 

^ong variants in this regard a. variants rha, differ from .he aforenrenttoned nncfeic 
5 1 Lecu.es by nuc.eofide subsfimtions, de.er.ons or ,nsert.ons The sub_ 
prions or insertions may involve one or nrore nuc.eo.ides. The vanants nray be a,^ 
in eoaing or non-codrng regions or both. Alterations in the coding regtons may produce 
conservative or uon-conservative amino acid substitutions, deletions or insertron, 
The nucleic ac.d molecules of the invention can also be engmeered, using merhods 
,„ generally known rn the art. for a variety of reasons, including modifying the clomng 
U and/or expressron of the gene produc, (.he polypeptide,. DNA ***** 
andom fragmentafion and PCR reassembly of gene fragmenrs and syn,he„ 
obgonucleorides are included as techniques, which ma, be used to engmeer the 
Ltide fences. Site-direcred mutagenesis may be used to insert new resu.cr.on 
15 IL, alter g,ycosy,afion patterns, change codon preference, produce sphce vananrs, 
introduce mutations and so forth. 

Nuc.eic acid molecules, whrch encode a polypept.de of me first aspect of the invenfion 
may be .igated to a heterologous sequence so that the comb.neO nue.eic ac.d modec 
rides a ms.on pro,e.„. Such combined nucleic ac.d molecules are included wrthm ^ 
20 second or third aspects of the invenfion. Por exampie, to screen pepr.de lrbrar,s o 
inhibitors o, the activity of .he polypeptide, it may be nsefu, to express, usmg such 
combined nucleic acid mo.ecule, a fusion prorein rbat can be — Jj 
commercially-avadable anubody. A msion protein may also be engineered ro - ■ 
c.eavage site located between .he sequence of .he po.ypep.ide of the invenfion and. 
25 sequence of a hetero.ogons protein so rha, fire po.ypeptide may be c.eaved and punfied 
away from the heterologous protein. 

Tfce nuCeic acid mo.ecnles of rhe invenfion aiso inCude anrisense ™^J2Z 
part.a..y commentary to nnc.eic ac.d mo.ecu.es encoding po.ypept.des o .1. p« 
uvenrion and tha, therefore hybridize to the encoding nuc.etc acrd mo.ecu.es 
3„ (bybridizafion,. Such anfisense mo.ecu.es, such as o.igonuc.eotides, can be desrgned 
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recognise, specifically bind to and prevent transcription of a target nucleic acid encoding 
a polypeptide of the invention, as will be known by those of ordinary skill in the art (see, 
for example, Cohen, J.S., Trends in Pharm. Sci., 10, 435 (1989), Okano, J. Neurochem. 
56, 560 (1991); O'Connor, J. Neurochem 56, 560 (1991); Lee et aL, Nucleic Acids Res 6, 
5 3073 (1979); Cooney et aL, Science 241, 456 (1988); Dervan et aL, Science 251, 1360 
(1991). 

The term "hybridization" as used here refers to the association of two nucleic acid 
molecules with one another by hydrogen bonding. Typically, one molecule will be fixed 
to a solid support and the other will be free in solution. Then, the two molecules may be 

10 placed in contact with one another under conditions that favour hydrogen bonding. 
Factors that affect this bonding include: the type and volume of solvent; reaction 
temperature; time of hybridization; agitation; agents to block the non-specific attachment 
of the liquid phase molecule to the solid support (Denhardt's reagent or BLOTTO); the 
concentration of the molecules; use of compounds to increase the rate of association of 

15 molecules (dextran sulphate or polyethylene glycol); and the stringency of the washing 
conditions following hybridization (see Sambrook et aL [supra]). 

The inhibition of hybridization of a completely complementary molecule to a target 
molecule may be examined using a hybridization assay, as known in the art (see, for 
example, Sambrook et al [supra]). A substantially homologous molecule will then 
20 compete for and inhibit the binding of a completely homologous molecule to the target 
molecule under various conditions of stringency, as taught in Wahl, G.M. and S.L. 
Berger (1987; Methods EnzymoL 152: 399-407) and Kimmel, A.R. (1987; Methods 
Enzymol. 152: 507-511). 

"Stringency" refers to conditions in a hybridization reaction that favour the association of 
25 very similar molecules over association of molecules that differ. High stringency 
hybridisation conditions are defined as overnight incubation at 42(C in a solution 
comprising 50% formamide, 5XSSC (150mM NaCl, 15mM trisodium citrate), 50mM 
sodium phosphate (pH7.6), 5x Denhardts solution, 10% dextran sulphate, and 20 
microgram/ml denatured, sheared salmon sperm DNA, followed by washing the filters in 
30 0.1X SSC at approximately 65(C. Low stringency conditions involve the hybridisation 
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reaction being carried out at 35(C (see Sambrook et al. [supra]). Preferably, the 
conditions used for hybridization are those of high stringency. 

Preferred embodiments of this aspect of the invention are nucleic acid molecules that are 
at least 70% identical over their entire length to a nucleic acid molecule encoding the 
5 ADS1 polypeptide (SEQ ID NO: 2), ADS 2 polypeptide (SEQ ID NO: 4), ADS3 
polypeptide (SEQ ID NO: 6), ADS4 polypeptide (SEQ ID NO: 8), ADS5 polypeptide 
(SEQ ID NO: 10) and nucleic acid molecules that are substantially complementary to 
such nucleic acid molecules. Preferably, a nucleic acid molecule according to this aspect 
of the invention comprises a region that is at least 80% identical over its entire length to 
0 the nucleic acid molecule having the sequence given in SEQ ID NO: 1, SEQ ID NO: 3, 
SEQ ID NO: 5, SEQ ID NO: 7, SEQ ID NO: 9 or a nucleic acid molecule that is 
complementary thereto. In this regard, nucleic acid molecules at least 90%, preferably at 
least 95%, more preferably at least 98% or 99% identical over their entire length to the 
same are particularly preferred. Preferred embodiments in this respect are nucleic acid 
IS molecules that encode polypeptides that retain substantially the same biological function 
or activity as the ADS1, ADS 2, ADS 3, ADS4 or ADS5 polypeptides. 
The invention also provides a process for detecting a nucleic acid molecule of the 
invention, comprising the steps of: (a) contacting a nucleic probe according to the 
invention with a biological sample under hybridizing conditions to form duplexes; and 
20 (b) detecting any such duplexes that are formed. 

As discussed additionally below in connection with assays that may be utilised according 
to the invention, a nucleic acid molecule as described above may be used as a 
hybridization probe for RNA, cDNA or genomic DNA, in order to isolate full-length 
cDNAs and genomic clones encoding the ADS1, ADS2, ADS3, ADS4 or ADS5 
25 polypeptides and to isolate cDNA and genomic clones of homologous or orthologous 
genes that have a high sequence similarity to the gene encoding this polypeptide. 
In this regard, the following techniques, among others known in the art, may be utilised 
and are discussed below for purposes of illustration. Methods for DNA sequencing and 
analysis are well known and are generally available in the art and may, indeed, be used to 
30 practice many of the embodiments of the invention discussed herein. Such methods may 
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employ such enzymes as the Klenow fragment of DNA polymerase I, Sequenase (US 
Biochemical Corp, Cleveland, OH), Taq polymerase (Perkin Elmer), thermostable T7 
polymerase (Amersham, Chicago, IL), or combinations of polymerases and proof-reading 
exonucleases such as those found in the ELONGASE Amplification System marketed by 
5 Gibco/BRL (Gaithersburg, MD). Preferably, the sequencing process may be automated 
using machines such as the Hamilton Micro Lab 2200 (Hamilton, Reno, NV), the Peltier 
Thermal Cycler (PTC200; MJ Research, Watertown, MA) and the ABI Catalyst and 373 
and 377 DNA Sequencers (Perkin Elmer). 

One method for isolating a nucleic acid molecule encoding a polypeptide with an 

10 equivalent function to that of the ADS1, ADS2, ADS 3, ADS4 or ADS5 polypeptides is to 
probe a genomic or cDNA library with a natural or artificially-designed probe using 
standard procedures that are recognised in the art (see, for example, "Current Protocols in 
Molecular Biology", Ausubel et al. (eds). Greene Publishing Association and John Wiley 
Interscience, New York, 1989,1992). Probes comprising at least 15, preferably at least 

15 30, and more preferably at least 50, contiguous bases that correspond to, or are 
complementary to, nucleic acid sequences from the appropriate encoding gene (SEQ ID 
NO: 1, SEQ ID NO: 3, SEQ ID NO: 5, SEQ ID NO: 7 or SEQ ID NO: 9), are particularly 
useful probes. Such probes may be labelled with an analytically detectable reagent to 
facilitate their identification. Useful reagents include, but are not limited to, 

20 radioisotopes, fluorescent dyes and enzymes that are capable of catalysing the formation 
of a detectable product. Using these probes, the ordinarily skilled artisan will be capable 
of isolating complementary copies of genomic DNA, cDNA or RNA polynucleotides 
encoding proteins of interest from human, mammalian or other animal sources and 
screening such sources for related sequences, for example, for additional members of the 

25 family, type and/or subtype. 

In many cases, isolated cDNA sequences will be incomplete, in that the region encoding 
. the polypeptide will be cut short, normally at the 5' end. Several methods are available to 
obtain full length cDNAs, or to extend short cDNAs. Such sequences may be extended 
utilising a partial nucleotide sequence and employing various methods known in the art to 
30 detect upstream sequences such as promoters and regulatory elements. For example, one 
method which may be employed is based on the method of Rapid Amplification of cDNA 
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Ends (RACE; see, for examp.e, Frohman « a!., PNAS USA 85, 8998-9002, 1988). 
Recent mod.fica.ions of .his .echnique, exemplified by the MarathonTM technology 
(Clontech Laboratories Inc.), for example, have significantly simplified the search for 
,o„ger cDNAs. A slightly different techniqne, termed -restriction-site" PCR, uses 
5 nniversa! primers to retrieve unknown nucleic actd sequence adjacent a known locus 
(Sarkar G (1993) PCR Methods Applic. 2: 318-322). Inverse PCR may also be used ,o 
amplify or to extend sequences using divergent primers based on a known region (Tngha, 
T et al (1988) Nucleic Acids Res. 16: 8186). Another method winch may be used * 
capture PCR whtch involves PCR amplification of DNA fragments adjacent a known 
„ sequence in human and yeas, artificia! chromosome DNA (Lagerstrom, M. et al. (1991) 
PCR Methods Applic, 1, 111-119). Another method which may be used to remeve 
unknown sequences is .hat of Parker, J.D. e, al. (1991); Nucieic Acids Res. 19: 3055- 
3060) Additionally, one may use PCR, nested primers, and PromoterFinderTM hbranes 
,o walk genomic DNA (Clontech, Palo Alto. CA). Th,s process avoids the need to screen 
15 libraries and is useful in finding intron/exon junctions. 

When screening for full-length cDNAs, i, is preferable ,o use libraries Ota. have been 
si.e-selec.ed to include larger cDNAs. Also, random-primed libraries are preferable, in 
eha. .hey will contain more sequences that contain the 5' regions of genes. Use of a 
randomly primed library may be especially preferable for situations in which an ohgo 
2 0 dCD library does not yield a full-length cDNA. Genomic libraries may be useful for 
extension of sequence into 5' non-transcribed regulatory regions. 

!„ one embodiment of the invention, .he nucleic acid molecules of the present invention 
may be used for chromosome localisation, In this technique, a nucleic acid molecule is 
specifically targeted to, and can hybridize with, a particular location on an individual 

25 human chromosome. The mapping of relevan. sequences to chromosomes according to 
the presen, invention is an important step in .he conf.nna.ory correlation of those 
sequences with to. gene-associated disease. Once a sequence has been mapped ,o a 
precise chromosomal location, toe physical position of the sequence on the chromosome 
can be correlared with genetic map data. Such data are found in, for example, V. 

,„ McKusick, Mendehan Inherirance in Man (available on-line through Johns Hopkins 
University Welch Medical Library). The relationships between genes and diseases that 
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have been mapped to the same chromosomal region are then identified through linkage 
analysis (coinheritance of physically adjacent genes). This provides valuable information 
to investigators searching for disease genes using positional cloning or other gene 
discovery techniques. Once the disease or syndrome has been crudely localised by 
5 genetic linkage to a particular genomic region, any sequences mapping to that area may 
represent associated or regulatory genes for further investigation. The nucleic acid 
molecule may also be used to detect differences in the chromosomal location due to 
translocation, inversion, etc. among normal, carrier, or affected individuals. 

The nucleic acid molecules of the present invention are also valuable for tissue 
10 localisation. Such techniques allow the determination of expression patterns of the 
polypeptide in tissues by detection of the mRNAs that encode them. These techniques 
include in situ hybridization techniques and nucleotide amplification techniques, such as 
PCR. Results from these studies provide an indication of the normal functions of the 
polypeptide in the organism. In addition, comparative studies of the normal expression 
15 pattern of mRNAs with that of mRNAs encoded by a mutant gene provide valuable 
insights into the role of mutant polypeptides in disease. Such inappropriate expression 
may be of a temporal, spatial or quantitative nature. 

The vectors of the present invention comprise nucleic acid molecules of the invention and 
may be cloning or expression vectors. The host cells of the invention, which may be 
20 transformed, transfested or transduced with the vectors of the invention may be 
prokaryotic or eukaryotic. 

The polypeptides of the invention may be prepared in recombinant form by expression of 
their encoding nucleic acid molecules in vectors contained within a host cell. Such 
expression methods are well known to those of skill in the art and many are described in 
25 detail by Sambrook et al (supra) and Fernandez & Hoeffler (1998, eds. "Gene expression 
systems. Using nature for the art of expression". Academic Press, San Diego, London, 
Boston, New York, Sydney, Tokyo, Toronto). 

Generally, any system or vector that is suitable to maintain, propagate or express nucleic 
acid molecules to produce a polypeptide in the required host may be used. The 
30 appropriate nucleotide sequence may be inserted into an expression system by any of a 
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variety of well-known and routine techniques, such as, for example, those described in 
Sambrook et al., (supra). Generally, the encoding gene can be placed under the control of 
a control element such as a promoter, ribosome binding site (for bacterial expression) 
and, optionally, an operator, so that the DNA sequence encoding the desired polypeptide 
is transcribed into RNA in the transformed host cell. 

Examples of suitable expression systems include, for example, chromosomal, episomal 
and virus-derived systems, including, for example, vectors derived from: bacterial 
plasmids, bacteriophage, transposons, yeast episomes, insertion elements, yeast 
chromosomal elements, viruses such as baculoviruses, papova viruses such as SV40, 
vaccinia viruses, adenoviruses, fowl pox viruses, pseudorabies viruses and retroviruses, 
or combinations thereof, such as those derived from plasmid and bacteriophage genetic 
elements, including cosmids and phagemids. Human artificial chromosomes (HACs) may 
also be employed to deliver larger fragments of DNA than can be contained and 
expressed in a plasmid. 

Particularly suitable expression systems include microorganisms such as bacteria 
transformed with recombinant bacteriophage, plasmid or cosmid DNA expression 
vectors; yeast transformed with yeast expression vectors; insect cell systems infected 
with virus expression vectors (for example, baculovirus); plant cell systems transformed 
with virus expression vectors (for example, cauliflower mosaic virus, CaMV; tobacco 
mosaic virus, TMV) or with bacterial expression vectors (for example, Ti or pBR322 
plasmids); or animal cell systems. Cell-free translation systems can also be employed to 
produce the polypeptides of the invention. 

Introduction of nucleic acid molecules encoding a polypeptide of the present invention 
into host cells can be effected by methods described in many standard laboratory 
manuals, such as Davis et al., Basic Methods in Molecular Biology (1986) and Sambrook 
et al.,[supra]. Particularly suitable methods include calcium phosphate transfection, 
DEAE-dextran mediated transfection, transvection, microinjection, cationic lipid- 
mediated transfection, electroporation, transduction, scrape loading, ballistic introduction 
or infection (see Sambrook et al., 1989 [supra]; Ausubel et al., 1991 [supra]; Spector, 
Goldman & Leinwald, 1998). In eukaryotic cells, expression systems may either be 
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transient (for example, episomal) or permanent (chromosomal integration) according to 
the needs of the system. 

The encoding nucleic acid molecule may or may not include a sequence encoding a 
control sequence, such as a signal peptide or leader sequence, as desired, for example, for 
5 secretion of the translated polypeptide into the lumen of the endoplasmic reticulum, into 
the periplasmic space or into the extracellular environment. These signals may be 
endogenous to the polypeptide or they may be heterologous signals. Leader sequences 
can be removed by the bacterial host in post-translational processing. 

In addition to control sequences, it may be desirable to add regulatory sequences that 

10 allow for regulation of the expression of the polypeptide relative to the growth of the host 
cell. Examples of regulatory sequences are those which cause the expression of a gene to 
be increased or decreased in response to a chemical or physical stimulus, including the 
presence of a regulatory compound or to various temperature or metabolic conditions. 
Regulatory sequences are those non-translated regions of the vector, such as enhancers, 

15 promoters and 5' and 3' untranslated regions. These interact with host cellular proteins to 
carry out transcription and translation. Such regulatory sequences may vary in their 
strength and specificity. Depending on the vector system and host utilised, any number of 
suitable transcription and translation elements, including constitutive and inducible 
promoters, may be used. For example, when cloning in bacterial systems, inducible 

20 promoters such as the hybrid lacZ promoter of the Bluescript phagemid (Stratagene, 
LaJolla, CA) or pSportlTM plasmid (Gibco BRL) and the like may be used. The 
baculovirus polyhedrin promoter may be used in insect cells. Promoters or enhancers 
derived from the genomes of plant cells (for example, heat shock, RUBISCO and storage 
protein genes) or from plant viruses (for example, viral promoters or leader sequences) 

25 may be cloned into the vector. In mammalian cell systems, promoters from mammalian 
genes or from mammalian viruses are preferable. If it is necessary to generate a cell line 
that contains multiple copies of the sequence, vectors based on SV40 or EBV may be 
used with an appropriate selectable marker. 

An expression vector is constructed so that the particular nucleic acid coding sequence is 
30 located in the vector with the appropriate regulatory sequences, the positioning and 
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orientation of the coding sequence with respect to the regulatory sequences being such 
that the coding sequence is transcribed under the "control" of the regulatory sequences, 
i.e., RNA polymerase which binds to the DNA molecule at the control sequences 
transcribes the coding sequence. In some cases it may be necessary to modify the 
5 sequence so that it may be attached to the control sequences with the appropriate 
orientation; i.e., to maintain the reading frame. 

The control sequences and other regulatory sequences may be ligated to the nucleic acid 
coding sequence prior to insertion into a vector. Alternatively, the coding sequence can 
be cloned directly into an expression vector that already contains the control sequences 
10 and an appropriate restriction site. 

For long-term, high-yield production of a recombinant polypeptide, stable expression is 
preferred. For example, cell lines, which stably express the polypeptide of interest, may 
be transformed using expression vectors which may contain viral origins of replication 
and/or endogenous expression elements and a selectable marker gene on the same or on a 
15 separate vector. Following the introduction of the vector, cells may be allowed to grow 
for 1-2 days in an enriched media before they are switched to selective media. The 
purpose of the selectable marker is to confer resistance to selection, and its presence 
allows growth and recovery of cells that successfully express the introduced sequences. 
Resistant clones of stably transformed cells may be proliferated using tissue culture 
20 techniques appropriate to the cell type. 

Mammalian cell lines available as hosts for expression are known in the art and include 
many immortalised cell lines available from the American Type Culture Collection 
(ATCC) including, but not limited to, Chinese hamster ovary (CHO), HeLa, baby 
hamster kidney (BHK), monkey kidney (COS), C127, 3T3, BHK, HEK 293, Bowes 
25 melanoma and human hepatocellular carcinoma (for example Hep G2) cells and a 
number of other cell lines. 

In the baculovirus system, the materials for baculovirus/insect cell expression systems are 
commercially available in kit form from, inter alia, Invitrogen, San Diego CA (the 
"MaxBac" kit). These techniques are generally known to those skilled in the art and are 
30 described fully in Summers and Smith, Texas Agricultural Experiment Station Bulletin 
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No. 1555 (1987). Particularly suitable host cells for use in this system include insect cells 
such as Drosophila S2 and Spodoptera Sf9 cells. 

There are many plant cell culture and whole plant genetic expression systems known in 
the art. Examples of suitable plant cellular genetic expression systems include those 
described in US 5,693,506; US 5,659,122; and US 5,608,143. Additional examples of 
genetic expression in plant cell culture have been described by Zenk, Phytochemistry 30, 
3861-3863 (1991). 

In particular, all plants from which protoplasts can be isolated and cultured to give whole 
regenerated plants can be utilised, so that whole plants are recovered which contain the 
transferred gene. Practically all plants can be regenerated from cultured cells or tissues, 
including but not limited to all major species of sugar cane, sugar beet, cotton, fruit and 
other trees, legumes and vegetables. 

Examples of particularly preferred bacterial host cells include streptococci, 
staphylococci, E. coli, Streptomyces and Bacillus subtilis cells. 

Examples of particularly suitable host cells for fungal expression include yeast cells (for 
example, S. cerevisiae) and Aspergillus cells. 

Any number of selection systems are known in the art that may be used to recover 
transformed cell lines. Examples include the herpes simplex virus thymidine kinase 
(Wigler, M. et al. (1977) Cell 11: 223-32) and adenine phosphoribosyltransferase (Lowy, 
I. et al. (1980) Cell 22: 817-23) genes that can be employed in tk- or aprt± cells, 
respectively. 

Also, antimetabolite, antibiotic or herbicide resistance can be used as the basis for 
selection; for example, dihydrofolate reductase (DHFR) that confers resistance to 
methotrexate (Wigler, M. et al. (1980) Proc. Natl. Acad. Sci. 77: 3567-70); npt, which 
confers resistance to the aminoglycosides neomycin and G-418 (Colbere-Garapin, F. et al 
(1981) J. Mol. Biol. 150: 1-14) and als or pat, which confer resistance to chlorsulfuron 
and phosphinotricin acetyltransferase, respectively. Additional selectable genes have 
been described, examples of which will be clear to those of skill in the art. 

Although the presence or absence of marker gene expression suggests that the gene of 
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merest is aiso present, its presence and expression may need to be confirmed For 
example, if the re.evan, sequence is inserted within a marker gene sequence, transformed 
ceils containing the approprrate sequences can be identified by the absence of marker 
gene function. Alternatively, a marker gene can be placed in tandem with a sequence 
, encoding a polypeptide of me invention under the contro! of a single promote. 

Expression of the marker gene in response to induction or se.ection usually mdtcates 

expression of the tandem gene as well. 

Alternatively, host cells that contain a nucleic acid sequence encoding a polypeptide of 
, he invention and which express said polypeptide may be tdentified by a vanety of 

10 procedures known to those of skill in the art. These procedures include, but are no. 
iimited to, DNA-DNA or DNA-RNA hybridizations and protein bioassays, for example, 
fluorescence activated cell sorting (FACS) or immunoassay techniques 
cnzyme-hnked immunosorbent assay [EUSA) and radioimmunoassay (RIA1), ha. 
inc ,ude membrane, solution, or chip based .ecologies for .he detection and/or 

15 quantification of nucleic acid or protem (see Hampton, R. e, al. (1990) Serological 

». i adc Prpss St Paul MN) and Maddox, D.E. et al. (ly&J) 
Methods, a Laboratory Manual, APS Press, M ram, ivu-u 

J. Exp. Med, 158, 1211-1216). 

A wide variety of labels and conjugation techniques are known by those skilled in the art 
and may be used in various nucleic acid and ammo add assays. Means for producing 
w .abel.ed hybridization or PCR probes for detecting sequences re.ated to nucletc acd 
mol ecules encoding polypeptides of the present mvention include oligolabelhng, n.ck 
tr a„s,ation, end-labelling or PCR amplification using a labelled polynucleotide^ 
Alternatively, .he sequences encoding .he polypeptide of the invention may be cloned 
in ,0 a vector for me production of an mRNA probe. Such vectors are known ,n the art, 
25 are commercially available, and may be used to syn.hes.se RNA probes in vrtrt, by 
addition of an approprtate RNA po.ymerase such as T7, T3 or SP6 and ,abe, led 
nuclides. These procedures may be conduced using a variety of commerc.ally 
available kits (Pharmacra & Upjohn, (Ka,amazco, MD; Promega (Madison WD; and U.S. 
Biochemical Corp., Cleveland, OH)). 
30 Suitable reporter molecules or lab* which may be used for ease of detection, include 
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radionuclides, enzymes and fluorescent, chemiluminescent or chromogenic agents as well 
as substrates, cofactors, inhibitors, magnetic particles, and the like. 

Nucleic acid molecules according to the present invention may also be used to create 
transgenic animals, particularly rodent animals. Such transgenic animals form a further 
5 aspect of the present invention. This may be done locally by modification of somatic 
cells, or by germ line therapy to incorporate heritable modifications. Such transgenic 
animals may be particularly useful in the generation of animal models for drug molecules 
effective as modulators of the polypeptides of the present invention. 

The polypeptide can be recovered and purified from recombinant cell cultures by well- 
10 known methods including ammonium sulphate or ethanol precipitation, acid extraction, 
anion or cation exchange chromatography, phosphocellulose chromatography, 
hydrophobic interaction chromatography, affinity chromatography, hydroxylapatite 
chromatography and lectin chromatography. High performance liquid chromatography is 
particularly useful for purification. Well-known techniques for refolding proteins may be 
15 employed to regenerate an active conformation when the polypeptide is denatured during 
isolation and or purification. 

Specialised vector constructions may also be used to facilitate purification of proteins, as 
desired, by joining sequences encoding the polypeptides of the invention to a nucleotide 
sequence encoding a polypeptide domain that will facilitate purification of soluble 

20 proteins. Examples of such purification-facilitating domains include metal chelating 
peptides such as histidine-tryptophan modules that allow purification on immobilised 
metals, protein A domains that allow purification on immobilised immunoglobulin, and 
the domain utilised in the FLAGS extension/affinity purification system (Immunex Corp., 
Seattle, WA). The inclusion of cleavable linker sequences such as those specific for 

25 Factor XA or enterokinase (Invitrogen, San Diego, CA) between the purification domain 
and the polypeptide of the invention may be used to facilitate purification. One such 
expression vector provides for expression of a fusion protein containing the polypeptide 
of the invention fused to several histidine residues preceding a thioredoxin or an 
enterokinase cleavage site. The histidine residues facilitate purification by IMAC 

30 (immobilised metal ion affinity chromatography as described in Porath, J. et aL (1992), 
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Pro. Exp Purif 3:263-281) white dre .hioredoxin or enrerokinase cleavage site provides 

rr f or purging ,he — ™ 

which contain fusion proteins is provided in Kroll, D.J. et al. (1993, 

441-453). 

„ r„e po.ypep.rde ,s <o be expressed for use in screenrng assays, generaUy i. is preferred 
I TZ prod-d a. .he surface of rhe hosr eel, in which i, , expressed. In ,hrs even, 
22 cells may he harvesred prior ,o use in rhe screening assay, for exanrpleu, 
Ihnioues such as fluorescence acrivared ce„ sorring (FACS) or m— mdy 
Xe^Urepoiypeplide is secreled h.,0 me medium, rhe — 
ta ordl to recover and purify .he expressed polypepride. If po.ypepbde s produced 
inu^euulany, .he ceils mus. firs, he lysed before rhe polypepude is recovered. 
The polypepude of Ure invenuon can be used ro screen libraries o, compounds in any of a 
Zly Z drug screening .echni q nes. Such compounds mav acriva.e (agon.se, or mb.br 
agonise) I .eve! o, expression of .be gene or rhe acrivUy o, rhe poiypeprrde o, rhe 

effe «ive ro alter ,he expression of a natural gene that encodes a polypept.de of « e b« 
I^crof the tnvention or toregu,, .he acdvttyofapolypeplideoftheftrs, aspect of.be 

invention. 

Ag „„is, or antagonist compounds may be iso.a.ed from, for example, cells, ceU-free 
0 prions chemical bbraries or natural product mixtures. These agonists or anragomsts 
0 t namra! or mod,ed snbstra^s, hgands, enzymes, receptors or su^ur* or 

Jdona, mimerics. For a su.rab.e review of such screening ,ecbn,ues, see Cobgan 

al Current Prolocols in Immunology 1(2): Chapter 5 (1991). 

Compounds ma, are mosr hxe.y ,o he good antagonist are mo.ecu.es ,« bind to *. 
25 ^ptide of the invenbon withou, inducing the bio.og.ca, effects of me po.ypep d 
Z Ling ro ... Porenba. antagonrsrs inc.ude small organic — ^ 
plypepbdes and anybodies ma, hrnd .0 me polypepbde of rhe .nvenuo - - -by 
„Jbi, or exbngu.sh .,s acuvhy. hr ,hrs fashron, binding of .he po.ypepdd ,0 norma, 
clar binding moires may be inhibit such ,ha, ,he norm, bio.ogrca. aCv„ y of 
30 the polypeptide is prevented. 
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The polypeptide of the invention that is employed in such a screening technique may be 
free in solution, affixed to a solid support, borne on a cell surface or located 
intracellularly. In general, such screening procedures may involve using appropriate cells 
or cell membranes that express the polypeptide that are contacted with a test compound to 
5 observe binding, or stimulation or inhibition of a functional response. The functional 
response of the cells contacted with the test compound is then compared with control 
cells that were not contacted with the test compound. Such an assay may assess whether 
the test compound results in a signal generated by activation of the polypeptide, using an 
appropriate detection system. Inhibitors of activation are generally assayed in the 
10 presence of a known agonist and the effect on activation by the agonist in the presence of 
the test compound is observed. 

Alternatively, simple binding assays may be used, in which the adherence of a test 
compound to a surface bearing the polypeptide is detected by means of a label directly or 
indirectly associated with the test compound or in an assay involving competition with a 
15 labelled competitor. In another embodiment, competitive drug screening assays may be 
used, in which neutralising antibodies that are capable of binding the polypeptide 
specifically compete with a test compound for binding. In this manner, the antibodies can 
be used to detect the presence of any test compound that possesses specific binding 
affinity for the polypeptide. 

20 Assays may also be designed to detect the effect of added test compounds on the 
production of mRNA encoding the polypeptide in cells. For example, an ELISA may be 
constructed that measures secreted or cell-associated levels of polypeptide using 
monoclonal or polyclonal antibodies by standard methods known in the art, and this can 
be used to search for compounds that may inhibit or enhance the production of the 

25 polypeptide from suitably manipulated cells or tissues. The formation of binding 
complexes between the polypeptide and the compound being tested may then be 
measured. 

Another technique for drug screening which may be used provides for high throughput 
screening of compounds having suitable binding affinity to the polypeptide of interest 
30 (see International patent application WO84/03564). In this method, large numbers of 
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different small test compounds are synthesised on a solid substrate, which may then be 
reacted with the polypeptide of the invention and washed. One way of immobilising the 
polypeptide is to use non-neutralising antibodies. Bound polypeptide may then be 
detected using methods that are well known in the art. Purified polypeptide can also be 
5 coated directly onto plates for use in the aforementioned drug screening techniques. 

The polypeptide of the invention may be used to identify membrane-bound or soluble 
receptors, through standard receptor binding techniques that are known in the art, such as 
ligand binding and crosslinking assays in which the polypeptide is labelled with a 
radioactive isotope, is chemically modified, or is fused to a peptide sequence that 

10 facilitates its detection or purification, and incubated with a source of the putative 
receptor (for example, a composition of cells, cell membranes, cell supernatants, tissue 
extracts, or bodily fluids). The efficacy of binding may be measured using biophysical 
techniques such as surface plasmon resonance and spectroscopy. Binding assays may be 
used for the purification and cloning of the receptor, but may also identify agonists and 

15 antagonists of the polypeptide, that compete with the binding of the polypeptide to its 
receptor. Standard methods for conducting screening assays are well understood in the 
art. 

The invention also includes a screening kit useful in the methods for identifying agonists, 
antagonists, ligands, receptors, substrates, and enzymes, that are described above. 

20 The invention includes the agonists, antagonists, ligands, receptors, substrates and 
enzymes, and other compounds which modulate the activity or antigenicity of the 
polypeptide of the invention discovered by the methods that are described above. 

The invention also provides pharmaceutical compositions comprising a polypeptide, 
nucleic acid, ligand or compound of the invention in combination with a suitable 
25 pharmaceutical carrier. These compositions may be suitable as therapeutic or diagnostic 
reagents, as vaccines, or as other immunogenic compositions, as outlined in detail below. 

According to the terminology used herein, a composition containing a polypeptide, 
nucleic acid, ligand or compound [X] is "substantially free of" impurities [herein, Y] 
when at least 85% by weight of the total X+Y in the composition is X. Preferably, X 
30 comprises at least about 90% by weight of the total of X+Y in the composition, more 
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preferably at least about 95%, 98% or even 99% by weight. 

The pharmaceutical compositions should preferably comprise a therapeutically effective 
amount of the polypeptide, nucleic acid molecule, ligand, or compound of the invention. 
The term "therapeutically effective amount" as used herein refers to an amount of a 
5 therapeutic agent needed to treat, ameliorate, or prevent a targetted disease or condition, 
or to exhibit a detectable therapeutic or preventative effect. For any compound, the 
therapeutically effective dose can be estimated initially either in cell culture assays, for 
example, of neoplastic cells, or in animal models, usually mice, rabbits, dogs, or pigs. 
The animal model may also be used to determine the appropriate concentration range and 
10 route of administration. Such information can then be used to determine useful doses and 
routes for administration in humans. 

The precise effective amount for a human subject will depend upon the severity of the 
disease state, general health of the subject, age, weight, and gender of the subject, diet, 
time and frequency of administration, drug combination(s), reaction sensitivities, and 
15 tolerance/response to therapy. This amount can be determined by routine experimentation 
and is within the judgement of the clinician. Generally, an effective dose will be from 
0.01 mg/kg to 50 mg/kg, preferably 0.05 mg/kg to 10 mg/kg. Compositions may be 
administered individually to a patient or may be administered in combination with other 
agents, drugs or hormones. 

20 A pharmaceutical composition may also contain a pharmaceutically acceptable carrier, 
for administration of a therapeutic agent. Such carriers include antibodies and other 
polypeptides, genes and other therapeutic agents such as liposomes, provided that the 
carrier does not itself induce the production of antibodies harmful to the individual 
receiving the composition, and which may be administered without undue toxicity. 

25 Suitable carriers may be large, slowly metabolised macromolecules such as proteins, 
polysaccharides, polylactic acids, polyglycolic acids, polymeric amino acids, amino acid 
copolymers and inactive virus particles. 

Pharmaceutically acceptable salts can be used therein, for example, mineral acid salts 
such as hydrochlorides, hydrobromides, phosphates, sulphates, and the like; and the salts 
30 of organic acids such as acetates, propionates, malonates, benzoates, and the like. A 
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thorough discussion of pharmaceutical^ acceptable carriers is available in Remington's 
Pharmaceutical Sciences (Mack Pub. Co., N.J. 1991). 

Pharmaceutical^ acceptable carriers in therapeutic compositions may additionally 
contain liquids such as water, saline, glycerol and ethanol. Additionally, auxiliary 
5 substances, such as wetting or emulsifying agents, pH buffering substances, and the like, 
may be present in such compositions. Such carriers enable the pharmaceutical 
compositions to be formulated as tablets, pills, dragees, capsules, liquids, gels, syrups, 
slurries, suspensions, and the like, for ingestion by the patient. 

Once formulated, the compositions of the invention can be administered directly to the 
10 subject. The subjects to be treated can be animals; in particular, human subjects can be 
treated. 

The pharmaceutical compositions utilised in this invention may be administered by any 
number of routes including, but not limited to, oral, intravenous, intramuscular, intra- 
arterial, intramedullary, intrathecal, intraventricular, transdermal or transcutaneous 

15 applications (for example, see WO98/20734), subcutaneous, intraperitoneal, intranasal, 
enteral, topical, sublingual, intravaginal or rectal means. Gene guns or hyposprays may 
also be used to administer the pharmaceutical compositions of the invention. Typically, 
the therapeutic compositions may be prepared as injectables, either as liquid solutions or 
suspensions; solid forms suitable for solution in, or suspension in, liquid vehicles prior to 

20 injection may also be prepared. 

Direct delivery of the compositions will generally be accomplished by injection, 
subcutaneously, intraperitoneally, intravenously or intramuscularly, or delivered to the 
interstitial space of a tissue. The compositions can also be administered into a lesion. 
Dosage treatment may be a single dose schedule or a multiple dose schedule. 

25 If the activity of the polypeptide of the invention is in excess in a particular disease state, 
several approaches are available. One approach comprises administering to a subject an 
inhibitor compound (antagonist) as described above, along with a pharmaceutically 
acceptable carrier in an amount effective to inhibit the function of the polypeptide, such 
as by blocking the binding of ligands, substrates, enzymes, receptors, or by inhibiting a 

30 second signal, and thereby alleviating the abnormal condition. Preferably, such 
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antagonists are antibodies. Most preferably, such antibodies are chimeric and/or 
humanised to minimise their immunogenicity, as described previously. 

In another approach, soluble forms of the polypeptide that retain binding affinity for the 
ligand, substrate, enzyme, receptor, in question, may be administered. Typically, the 
5 polypeptide may be administered in the form of fragments that retain the relevant 
portions. 

In an alternative approach, expression of the gene encoding the polypeptide can be 
inhibited using expression blocking techniques, such as the use of antisense nucleic acid 
molecules (as described above), either internally generated or separately administered. 

10 Modifications of gene expression can be obtained by designing complementary 
sequences or antisense molecules (DNA, RNA, or PNA) to the control, 5 1 or regulatory 
regions (signal sequence, promoters, enhancers and introns) of the gene encoding the 
polypeptide. Similarly, inhibition can be achieved using "triple helix" base-pairing 
methodology. Triple helix pairing is useful because it causes inhibition of the ability of 

15 the double helix to open sufficiently for the binding of polymerases, transcription factors, 
or regulatory molecules. Recent therapeutic advances using triplex DNA have been 
described in the literature (Gee, J.E. et al. (1994) In: Huber, B.E. and B.L Carr, Molecular 
and Immunologic Approaches, Futura Publishing Co., Mt. Kisco, NY). The 
complementary sequence or antisense molecule may also be designed to block translation 

20 of mRNA by preventing the transcript from binding to ribosomes. Such oligonucleotides 
may be administered or may be generated in situ from expression in vivo. 

In addition, expression of the polypeptide of the invention may be prevented by using 
ribozymes specific to its encoding mRNA sequence. Ribozymes are catalytically active 
RNAs that can be natural or synthetic (see for example Usman, N, et al., Curr. Opin. 

25 Struct. Biol (1996) 6(4), 527-33). Synthetic ribozymes can be designed to specifically 
cleave mRNAs at selected positions thereby preventing translation of the mRNAs into 
functional polypeptide. Ribozymes may be synthesised with a natural ribose phosphate 
backbone and natural bases, as normally found in RNA molecules. Alternatively the 
ribozymes may be synthesised with non-natural backbones, for example, 2-O-methyl 

30 RNA, to provide protection from ribonuclease degradation and may contain modified 
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bases. 

RNA molecules may be modified to increase intracellular stability and half-life. Possible 
modifications include, but are not limited to, the addition of flanking sequences at the 5' 
and/or 3* ends of the molecule or the use of phosphorothioate or 2' O-methyl rather than 
phosphodiesterase linkages within the backbone of the molecule. This concept is inherent 
in the production of PNAs and can be extended in all of these molecules by the inclusion 
of non-traditional bases such as inosine, queosine and butosine, as well as acetyl-, 
methyl-, thio- and similarly modified forms of adenine, cytidine, guanine, thymine and 
uridine which are not as easily recognised by endogenous endonucleases. 

For treating abnormal conditions related to an under-expression of the polypeptide of the 
invention and its activity, several approaches are also available. One approach comprises 
administering to a subject a therapeutically effective amount of a compound that activates 
the polypeptide, i.e., an agonist as described above, to alleviate the abnormal condition. 
Alternatively, a therapeutic amount of the polypeptide in combination with a suitable 
pharmaceutical carrier may be administered to restore the relevant physiological balance 
of polypeptide. 

Gene therapy may be employed to effect the endogenous production of the polypeptide 
by the relevant cells in the subject. Gene therapy is used to treat permanently the 
inappropriate production of the polypeptide by replacing a defective gene with a 
corrected therapeutic gene. 

Gene therapy of the present invention can occur in vivo or ex vivo. Ex vivo gene therapy 
requires the isolation and purification of patient cells, the introduction of a therapeutic 
gene and introduction of the genetically altered cells back into the patient. In contrast, in 
vivo gene therapy does not require isolation and purification of a patient's cells. 

The therapeutic gene is typically "packaged" for administration to a patient. Gene 
delivery vehicles may be non-viral, such as liposomes, or replication-deficient viruses, 
such as adenovirus as described by Berkner, K.L., in Curr. Top. Microbiol. Immunol., 
158, 39-66 (1992) or adeno-associated virus (AAV) vectors as described by Muzyczka, 
R, in Curr. Top. Microbiol. Immunol., 158, 97-129 (1992) and U.S. Patent No. 
5,252,479. For example, a nucleic acid molecule encoding a polypeptide of the invention 
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may be engineered for expression in a replication-defective retroviral vector. This 
expression construct may then be isolated and introduced into a packaging cell 
transduced with a retroviral plasmid vector containing RNA encoding the polypeptide, 
such that the packaging cell now produces infectious viral particles containing the gene 
of interest. These producer cells may be administered to a subject for engineering cells in 
vivo and expression of the polypeptide in vivo (see Chapter 20, Gene Therapy and other 
Molecular Genetic-based Therapeutic Approaches, (and references cited therein) in 
Human Molecular Genetics (1996), T Strachan and A P Read, BIOS Scientific Publishers 
Ltd). 

Another approach is the administration of "naked DNA" in which the therapeutic gene is 
directly injected into the bloodstream or muscle tissue. 

In situations in which the polypeptides or nucleic acid molecules of the invention are 
disease causing agents, the invention provides that they can be used in vaccines to raise 
antibodies against the disease causing agent. 

Vaccines according to the invention may either be prophylactic (ie. to prevent infection) 
or therapeutic (ie. to treat disease after infection). Such vaccines comprise immunising 
antigen(s), immunogen(s), polypeptide(s), protein(s) or nucleic acid, usually in 
combination with pharmaceutically-acceptable carriers as described above, which include 
any carrier that does not itself induce the production of antibodies harmful to the 
individual receiving the composition. Additionally, these carriers may function as 
immunostimulating agents ("adjuvants"). Furthermore, the antigen or immunogen may be 
conjugated to a bacterial toxoid, such as a toxoid from diphtheria, tetanus, cholera, H. 
pylori, and other pathogens. 

Since polypeptides may be broken down in the stomach, vaccines comprising 
polypeptides are preferably administered parenterally (for instance, subcutaneous, 
intramuscular, intravenous, or intradermal injection). Formulations suitable for parenteral 
administration include aqueous and non-aqueous sterile injection solutions which may 
contain anti-oxidants, buffers, bacteriostats and solutes which render the formulation 
isotonic with the blood of the recipient, and aqueous and non-aqueous sterile suspensions 
which may include suspending agents or thickening agents. 
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The vaccine formulations of the invention may be presented in unit-dose or multi-dose 
containers. For example, sealed ampoules and vials and may be stored in a freeze-dried 
condition requiring only the addition of the sterile liquid carrier immediately prior to use. 
The dosage will depend on the specific activity of the vaccine and can be readily 
5 determined by routine experimentation. 

This invention also relates to the use of nucleic acid molecules according to the present 
invention as diagnostic reagents. Detection of a mutated form of the gene characterised 
by the nucleic acid molecules of the invention which is associated with a dysfunction will 
provide a diagnostic tool that can add to, or define, a diagnosis of a disease, or 
10 susceptibility to a disease, which results from under-expression, over-expression or 
altered spatial or temporal expression of the gene. Individuals carrying mutations in the 
gene may be detected at the DNA level by a variety of techniques. 

Nucleic acid molecules for diagnosis may be obtained from a subject's cells, such as from 
blood, urine, saliva, tissue biopsy or autopsy material. The genomic DNA may be used 
15 directly for detection or may be amplified enzymatically by using PCR, ligase chain 
reaction (LCR), strand displacement amplification (SDA), or other amplification 
techniques (see Saiki et al., Nature, 324, 163-166 (1986); Bej, et al., Crit. Rev. Biochem. 
Molec. Biol., 26, 301-334 (1991); Birkenmeyer et al., J. Virol. Meth., 35, 117-126 
(1991); Van Brunt, J., Bio/Technology, 8, 291-294 (1990)) prior to analysis. 

20 In one embodiment, this aspect of the invention provides a method of diagnosing a 
disease in a patient, comprising assessing the level of expression of a natural gene 
encoding a polypeptide according to the invention and comparing said level of expression 
to a control level, wherein a level that is different to said control level is indicative of 
disease. The method may comprise the steps of: 

25 a)contacting a sample of tissue from the patient with a nucleic acid probe under stringent 
conditions that allow the formation of a hybrid complex between a nucleic acid 
molecule of the invention and the probe; 

b) contacting a control sample with said probe under the same conditions used in step a); 

c) and detecting the presence of hybrid complexes in said samples; 



c 



42 



15 



20 



Wherein detection of levels of the hybrid complex in the patient sample that differ from 

levels of the hybrid complex in the control sample is indicative of disease. 

A further aspect of the invention comprises a diagnostic method comprising the steps of: 

a) obtaining a tissue sample from a patient being tested for disease; 

b) isolating a nucleic acid molecule according to the invention from said tissue sample; 
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c)diagnosing the patient for disease by detecting the presence of a mutation in the nucleic 
acid molecule which is associated with disease. 

To aid the detection of nucleic acid molecules in the above-described methods, an 
10 amplification step, for example using PCR, may be included. 

Deletions and insertions can be detected by a change in the size of the amplified product 
in comparison to the normal genotype. Point mutations can be identified by hybridizing 
amplified DNA to labelled RNA of the invention or alternatively, labelled antisense DNA 
sequences of the invention. Perfectly matched sequences can be distinguished from 
mismatched duplexes by RNase digestion or by assessing differences in melting 
temperatures. The presence or absence of the mutation in the patient may be detected by 
contacting DNA with a nucleic acid probe that hybridises to the DNA under stringent 
conditions to form a hybrid double-stranded molecule, the hybrid double-stranded 
molecule having an unhybridised portion of the nucleic acid probe strand at any portion 
corresponding to a mutation associated with disease; and detecting the presence or 
absence of an unhybridised portion of the probe strand as an indication of the presence or 
absence of a disease-associated mutation in the corresponding portion of the DNA strand. 
Such diagnostics are particularly useful for prenatal and even neonatal testing. 
Point mutations and other sequence differences between the reference gene and "mutant" 
genes can be identified by other well-known techniques, such as direct DNA sequencing 
or single-strand conformational polymorphism, (see Orita et al., Genomics, 5, 874-879 
(1989)). For example, a sequencing primer may be used with double-stranded PCR 
product or a single-stranded template molecule generated by a modified PCR. The 
sequence determination is performed by conventional procedures with radiolabeled 
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nucleotides or by automatic sequencing procedures with fluorescent-tags. Cloned DNA 
segments may also be used as probes to detect specific DNA segments. The sensitivity of 
this method is greatly enhanced when combined with PCR. Further, point mutations and 
other sequence variations, such as polymorphisms, can be detected as described above, 
for example, through the use of allele-specific oligonucleotides for PCR amplification of 
sequences that differ by single nucleotides. 

DNA sequence differences may also be detected by alterations in the electrophoretic 
mobility of DNA fragments in gels, with or without denaturing agents, or by direct DNA 
sequencing (for example, Myers et al., Science (1985) 230: 1242). Sequence changes at 
specific locations may also be revealed by nuclease protection assays, such as RNase and 
SI protection or the chemical cleavage method (see Cotton et al., Proc. Natl. Acad. Sci. 
USA (1985) 85: 4397-4401). 

In addition to conventional gel electrophoresis and DNA sequencing, mutations such as 
microdeletions, aneuploidies, translocations, inversions, can also be detected by in situ 
15 analysis (see, for example, Keller et al., DNA Probes, 2nd Ed., Stockton Press, New 
York, N.Y., USA (1993)), that is, DNA or RNA sequences in cells can be analysed for 
mutations without need for their isolation and/or immobilisation onto a membrane. 
Fluorescence in situ hybridization (FISH) is presently the most commonly applied 
method and numerous reviews of FISH have appeared (see, for example, Trachuck et al., 
20 Science, 250, 559-562 (1990), and Trask et al., Trends, Genet., 7, 149-154 (1991)). 

In another embodiment of the invention, an array of oligonucleotide probes comprising a 
nucleic acid molecule according to the invention can be constructed to conduct efficient 
screening of genetic variants, mutations and polymorphisms. Array technology methods 
are well known and have general applicability and can be used to address a variety of 
25 questions in molecular genetics including gene expression, genetic linkage, and genetic 
variability (see for example: M.Chee et al., Science (1996), Vol 274, pp 610-613). 
In one embodiment, the array is prepared and used according to the methods described in 
PCT application W095/11995 (Chee et al); Lockhart, D. J. et al. (1996) Nat. Biotech. 14: 
1675-1680); and Schena, M. et al. (1996) Proc. Natl. Acad. Sci. 93: 10614-10619). 
Oligonucleotide pairs may range from two to over one million. The oligomers are 
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synthesized at designated areas on a substrate using a light-directed chemical process. 
The substrate may be paper, nylon or other type of membrane, filter, chip, glass slide or 
any other suitable solid support. In another aspect, an oligonucleotide may be synthesized 
on the surface of the substrate by using a chemical coupling procedure and an ink jet 
application apparatus, as described in PCT application W095/251116 (Baldeschweiler et 
al). In another aspect, a "gridded" array analogous to a dot (or slot) blot may be used to 
arrange and link cDNA fragments or oligonucleotides to the surface of a substrate using a 
vacuum system, thermal, UV, mechanical or chemical bonding procedures. An array, 
such as those described above, may be produced by hand or by using available devices 
(slot blot or dot blot apparatus), materials (any suitable solid support), and machines 
(including robotic instruments), and may contain 8, 24, 96, 384, 1536 or 6144 
oligonucleotides, or any other number between two and over one million which lends 
itself to the efficient use of commercially-available instrumentation. 

In addition to the methods discussed above, diseases may be diagnosed by methods 
comprising determining, from a sample derived from a subject, an abnormally decreased 
or increased level of polypeptide or mRNA. Decreased or increased expression can be 
measured at the RNA level using any of the methods well known in the art for the 
quantitation of polynucleotides, such as, for example, nucleic acid amplification, for 
instance PCR, RT-PCR, RNase protection, Northern blotting and other hybridization 
methods. 

Assay techniques that can be used to determine levels of a polypeptide of the present 
invention in a sample derived from a host are well-known to those of skill in the art and 
are discussed in some detail above (including radioimmunoassays, competitive-binding 
assays, Western Blot analysis and ELISA assays). This aspect of the invention provides a 
diagnostic method which comprises the steps of: (a) contacting a ligand as described 
above with a biological sample under conditions suitable for the formation of a ligand- 
polypeptide complex; and (b) detecting said complex. 

Protocols such as ELISA, RIA, and FACS for measuring polypeptide levels may 
additionally provide a basis for diagnosing altered or abnormal levels of polypeptide 
expression. Normal or standard values for polypeptide expression are established by 
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combining body fluids or cell extracts taken from normal mammalian subjects, preferably 
humans, with antibody to the polypeptide under conditions suitable for complex 
formation The amount of standard complex formation may be quantified by various 
methods, such as by photometric means. 

5 Antibodies which specifically bind to a polypeptide of the invention may be used for the 
diagnosis of conditions or diseases characterised by expression of the polypeptide, or in 
assays to monitor patients being treated with the polypeptides, nucleic acid molecules, 
ligands and other compounds of the invention. Antibodies useful for diagnostic purposes 
may be prepared in the same manner as those described above for therapeutics. 

10 Diagnostic assays for the polypeptide include methods that utilise the antibody and a 
label to detect the polypeptide in human body fluids or extracts of cells or tissues. The 
antibodies may be used with or without modification, and may be labelled by joining 
them, either covalently or non-covalently, with a reporter molecule. A wide variety of 
reporter molecules known in the art may be used, several of which are described above. 

15 Quantities of polypeptide expressed in subject, control and disease samples from biopsied 
tissues are compared with the standard values. Deviation between standard and subject 
values establishes the parameters for diagnosing disease. Diagnostic assays may be used 
to distinguish between absence, presence, and excess expression of polypeptide and to 
monitor regulation of polypeptide levels during therapeutic intervention. Such assays 

20 may also be used to evaluate the efficacy of a particular therapeutic treatment regimen in 
animal studies, in clinical trials or in monitoring the treatment of an individual patient. 

A diagnostic kit of the present invention may comprise: 

(a) a nucleic acid molecule of the present invention; 

(b) a polypeptide of the present invention; or 

25 (c) a ligand of the present invention. 

In one aspect of the invention, a diagnostic kit may comprise a first container containing 
a nucleic acid probe that hybridises under stringent conditions with a nucleic acid 
molecule according to the invention; a second container containing primers useful for 
amplifying the nucleic acid molecule; and instructions for using the probe and primers for 
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facilitating the diagnosis of disease. The kit may further comprise a third container 
holding an agent for digesting unhybridised RNA. 

In an alternative aspect of the invention, a diagnostic kit may comprise an array of 
nucleic acid molecules, at least one of which may be a nucleic acid molecule according to 
the invention. 

To detect polypeptide according to the invention, a diagnostic kit may comprise one or 
more antibodies that bind to a polypeptide according to the invention; and a reagent 
useful for the detection of a binding reaction between the antibody and the polypeptide. 

Such kits will be of use in diagnosing a disease or susceptibility to disease, particularly 
cardiovascular disease, cancer, asthma, COPD and inflammatory diseases. 

Various aspects and embodiments of the present invention will now be described in more 
detail by way of example, with particular reference to the ADS1, ADS2, ADS3, ADS4 
and ADS 5 polypeptides. 

It will be appreciated that modification of detail may be made without departing from the 
scope of the invention. 

Brief description of the Figures 

Figure 1: This is the front end of the Biopendium™ Target Mining Interface. A search of 
the database is initiated using the PDB code "1LFA:A". 

Figure 2A: A selection is shown of the Inpharmatica Genome Threader results for the 
search using 1LFA:A. The arrow indicates leukocyte integrin, a typical adhesion 
molecule. 

Figure 2B: A selection is shown of the Inpharmatica Genome Threader results for the 
search using 1LFA:A. The arrow indicates AAC74854.1 (ADS1). 

Figure 2C: Full list of forward PSI-BLAST results for the search using 1LFA:A. 
AAC74854.1 (ADS1) is not identified. 

Figure 3: The Redundant Sequence Display results page for AAC74854.1 (ADS1). 
Figure 4: PFAM search results for AAC74854.1 (ADS1). 
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Figure 5: NCBI protein report for AAC74854.1 (ADS1). 

Figure 6A: This is the front end of the Biopendium™ database. A search of the database 
is initiated using AAC74854.1 (ADS1), as the query sequence. 

Figure 6B: A selection of the Inpharmatica Genome Threader results of search using 
5 AAC74854.1 (ADS1), as the query sequence. The arrow points to 1LFA:A. 

Figure 6C: A selection of the reverse-maximised PSI-BLAST results obtained using 
AAC74854.1 (ADS1), as the query sequence. 

Figure 7: AlEye sequence alignment of AAC74854.1 (ADS1) and 1LFA:A. 
Figure 8A: LigEye for 1LFA: A that illustrates the sites of interaction of the bound metal 
10 ion required for adhesion activity with the metal binding ligands of the MIDAS motif of 
Homo Sapiens Leukocyte Function Antigen 1, 1LFA: A 

Figure 8B: iRasMol view of 1LFA: A, Homo Sapiens Leukocyte Function Antigen 1. 
The coloured balls represent the amino acids in Homo Sapiens Leukocyte Function 
Antigen 1 that are involved in the MIDAS motif and that are conserved in AAC74854.1 
15 (ADS1). 

Figure 9: This is the front end of the Biopendium™ Target Mining Interface. A search of 
the database is initiated using the PDB code "lAOX:A". 

Figure 10A: A selection is shown of the Inpharmatica Genome Threader results for the 
search using lAOX:A. The arrow indicates leukocyte integrin, a typical adhesion 
20 molecule. 

Figure 10B: A selection is shown of the Inpharmatica Genome Threader results for the 
search using lAOX:A. The arrow indicates AAC76768.1 (ADS2). 

Figure 10C: Full list of forward PSI-BLAST results for the search using lAOX:A. 
AAC7 6768.1 (ADS2) is not identified. 
25 Figure 1 1 : The Redundant Sequence Display results page for AAC76768 . 1 (ADS 2). 
Figure 12: PFAM search results for AAC76768.1 (ADS2). 
Figure 13: NCBI protein report for AAC7 6768.1 (ADS 2). 
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Figure 14A: This is the front end of the Biopendium™ database. A search of the database 
is initiated using AAC76768.1 (ADS2), as the query sequence. 

Figure 14B: A selection of the Inpharmatica Genome Threader results of search using 
AAC76768.1 (ADS2), as the query sequence. The arrow points to lAOX:A. 

5 Figure 14C: A selection of the reverse-maximised PSI-BLAST results obtained using 
AAC76768.1 (ADS2), as the query sequence. 

Figure 15: AlEye sequence alignment of AAC76768.1 (ADS2) and lAOX:A. 

Figure 16A: LigEye for 1AOX: A that illustrates the sites of interaction of the bound 
metal ion required for adhesion activity with the metal binding ligands of the MIDAS 
10 motif of Homo Sapiens Integrin Alpha 2 / Beta 1, lAOX:A 

Figure 16B: iRasMol view of lAOX:A, Homo Sapiens Integrin Alpha 2 / Beta 1. The 
coloured balls represent the amino acids in Homo Sapiens Integrin Alpha 2 / Beta 1 that 
are involved in the MIDAS motif and that are conserved in AAC76768.1 (ADS2). 

Figure 17: This is the front end of the Biopendium™ Target Mining Interface. A search 
15 of the database is initiated using the PDB code "1JLM". 

Figure 18 A: A selection is shown of the Inpharmatica Genome Threader results for the 
search using 1 JLM. The arrow indicates leukocyte integrin, a typical adhesion molecule. 

Figure 18B: A selection is shown of the Inpharmatica Genome Threader results for the 
search using 1JLM. The arrow indicates AAD05645.1 (ADS 3). 

20 Figure 18C: Full list of forward PSI-BLAST results for the search using 1JLM. 
AAD05645.1 (ADS 3) is not identified. 

Figure 19: The Redundant Sequence Display results page for AAD05645.1 (ADS3). 

Figure 20: PFAM search results for AAD05645.1 (ADS3). 

Figure 21: NCBI protein report for AAD05645.1 (ADS3). 

25 Figure 22A: This is the front end of the Biopendium™ database. A search of the database 
is initiated using AAD05645.1 (ADS3), as the query sequence. 



49 



Figure 22B: A selection of the Inpharmatica Genome Threader results of search using 
AAD05645.1 (ADS3), as the query sequence. The arrow points to 1JLM. 

Figure 22C: A selection of the reverse-maximised PSI-BLAST results obtained using 
AAD05645.1 (ADS3), as the query sequence. 

Figure 23: AlEye sequence alignment of AAD05645.1 (ADS3) and 1JLM. 

Figure 24A: LigEye for 1 JLM that illustrates the sites of interaction of the bound metal 
ion required for adhesion activity with the metal binding ligands of the MIDAS motif of 
Homo Sapiens Integrin CR3, 1 JLM 

Figure 24B: iRasMol view of 1JLM, Homo Sapiens Integrin CR3. The coloured balls 
represent the amino acids in Homo Sapiens Integrin CR3 that are involved in the MIDAS 
motif and that are conserved in AAD05645.1 (ADS3). 

Figure 25: This is the front end of the Biopendium™ Target Mining Interface. A search 
of the database is initiated using the PDB code "1QC5:A". 

Figure 26A: A selection is shown of the Inpharmatica Genome Threader results for the 
search using 1QC5:A. The arrow indicates leukocyte integrin, a typical adhesion 
molecule. 

Figure 26B: A selection is shown of the Inpharmatica Genome Threader results for the 
search using 1QC5:A. The arrow indicates CAA57766.1 (ADS4). 

Figure 26C: Full list of forward PSI-BLAST results for the search using 1QC5:A. 
CAA57766.1 (ADS4) is not identified. 

Figure 27: The Redundant Sequence Display results page for CAA57766.1 (ADS4). 
Figure 28: PFAM search results for CAA57766.1 (ADS4). 
Figure 29: NCBI protein report for CAA57766.1 (ADS4). 

Figure 30A: This is the front end of the BiopendiumTM database. A search of the database 
is initiated using CAA57766.1 (ADS4), as the query sequence. 

Figure 30B: A selection of the Inpharmatica Genome Threader results of search using 
CAA57766.1 (ADS4), as the query sequence. The arrow points to 1QC5:A. 
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Figure 30C: A selection of the reverse-maximised PSI-BLAST results obtained using 
CAA57766.1 (ADS4), as the query sequence. 

Figure 31: AlEye sequence alignment of CAA57766.1 (ADS4) and 1QC5:A. 

Figure 32 A: LigEye for 1QC5:A that illustrates the sites of interaction of the bound metal 
ion required for adhesion activity with the metal binding ligands of the MIDAS motif of 
Homo Sapiens Integrin Alpha 1 / Beta 1, 1QC5: A 

Figure 32B: iRasMol view of 1QC5:A, Homo Sapiens Integrin Alpha 1 / Beta 1. The 
coloured balls represent the amino acids in Homo Sapiens Integrin Alpha 1 / Beta lthat 
are involved in the MIDAS motif and that are conserved in CAA57766.1 (ADS4). 

Figure 33: This is the front end of the Biopendium™ Target Mining Interface. A search 
of the database is initiated using the PDB code "1JLM". 

Figure 34A: A selection is shown of the Inpharmatica Genome Threader results for the 
search using 1 JLM. The arrow indicates leukocyte integrin, a typical adhesion molecule. 

Figure 34B: A selection is shown of the Inpharmatica Genome Threader results for the 
search using 1JLM. The arrow indicates P10155 (ADS5). 

Figure 34C: Full list of forward PSI-BLAST results for the search using 1JLM. P10155 
(ADS 5) is not identified. 

Figure 35: The Redundant Sequence Display results page for P 10155 (ADS5). 
Figure 36: PFAM search results for P10155 (ADS5). 
Figure 37: NCBI protein report for P10155 (ADS5). 

Figure 38A: This is the front end of the Biopendium™ database. A search of the database 
is initiated using P10155 (ADS5), as the query sequence. 

Figure 38B: A selection of the Inpharmatica Genome Threader results of search using 
P10155 (ADS5), as the query sequence. The arrow points to 1JLM. 

Figure 38C: A selection of the reverse-maximised PSI-BLAST results obtained using 
P10155 (ADS5), as the query sequence. 

Figure 39: AlEye sequence alignment of P10155 (ADS 5) and 1JLM. 
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Figure 40 A: LigEye for 1JLM that illustrates the sites of interaction of the bound metal 
ion required for adhesion activity with the metal binding ligands of the MIDAS motif of 
Homo Sapiens Integrin CR3, 1JLM 

Figure 40B: iRasMol view of 1JLM, Homo Sapiens Integrin CR3. The coloured balls 
5 represent the amino acids in Homo Sapiens Integrin CR3 that are involved in the MIDAS 
motif and that are partly conserved in P10155 (ADS5). 

Figure 41: AlEye sequence alignment of P10155, Homo Sapiens Ro60 (ADS5), and the 
Mus musculus (AAF19049.1), Xenopus Laevis (AAC38001.1) and Caenorhabditis 
elegans (CAA98241.1) Ro60 homologs. 

10 

Examples 

Example 1:AAC74854.1 (ADS1) 

In order to initiate a search for novel, distantly related adhesion molecules, an archetypal 
family member is chosen, the I-domain from Homo Sapiens Leukocyte Function Antigen 
15 1. More specifically, the search is initiated using a structure from the Protein Data Bank 
(PDB) which is operated by the Research Collaboratory for Structural Bioinformatics. 

The structure chosen is the I-domain from Homo Sapiens Leukocyte Function Antigen 1 
(PDB code 1LFA:A; see Figure 1). 

A search of the Biopendium™ (using the Target Mining Interface) for relatives of 
20 1LFA:A takes place and returns 2729 Genome Threader results. The 2729 Genome 
Threader results include examples of typical adhesion molecules, such as leukocyte 
integrin alpha chain (see arrow in Figure 2A). 

Among the known adhesion molecules appears a protein of apparently unknown function, 
AAC74854.1 (ADS1; see arrow in figure 2B). The Inpharmatica Genome Threader has 
25 identified a sequence, AAC74854.1 (ADS1), as having a structure similar to Homo 
Sapiens Leukocyte Function Antigen 1, an adhesion molecule. The possession of a 
structure similar to an adhesion molecule suggests that AAC74854.1 (ADS1) functions as 
an adhesion molecule. The Genome Threader identifies this with 95% confidence. 
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The search of the Biopendium™ (using the Target Mining Interface) for relatives of 
1LFA:A also returns 630 Forward PSI-Blast results. Forward PSI-Blast (see figure 2C) is 
unable to identify this relationship; only the Inpharmatica Genome Threader is able to 
identify AAC74854.1 (ADS1) as an adhesion molecule. 

5 In order to assess what is known in the public domain databases about AAC74854.1 
(ADS1) the Redundant Sequence Display Page (Figure 3) is viewed. There are no 
associated PROSITE or PRINTS hits for AAC74854.1 (ADS1). PROSITE and PRINTS 
are databases that help to describe proteins of similar families. Returning no hits from 
both databases means that AAC74854.1 (ADS1) is unidentifiable as an adhesion 
10 molecule using PROSITE or PRINTS. The redundant sequence display also shows any 
predicted features of AAC50543.1 (CCS5). These include potential coiled coil and low 
complexity regions in the sequence. 

In order to identify if any other public domain annotation vehicle is able to annotate 
AAC74854.1 (ADS1) as an adhesion molecule, the AAC74854.1 (ADS1) protein 
15 sequence is searched against the PFAM database (Protein Family Database of Alignment 
and hidden Markov models) (see Figure 4). The results identifies one PFAM-B match to 
AAC74854.1, however PFAM-B matches confer no functional annotation, only sequence 
similarity to other functionally unannotated proteins. Thus PFAM does not identify 
AAC74854.1 (ADS1) as an adhesion molecule. 

20 The National Center for Biotechnology Information (NCBI) Genbank protein database is 
then viewed to examine if there is any further information that is known in the public 
domain relating to AAC74854.1 (ADS1). This is the U.S. public domain database for 
protein and gene sequence deposition (Figure 5). AAC74854.1 (ADS1) is a Escherichia 
Coli sequence, its Genbank protein ID is AAC74854.1 and it is 427 amino acids in 

25 length. AAC74854.1 (ADS1) was cloned by a group of scientists at the University of 
Wisconsin, U.S.A. The entry identifies AAC74854.1 (ADS1) as a hypothetical protein. 
The public domain information for this gene does not annotate it as an adhesion 
molecule. 
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Therefore, it can be concluded .ha. using all pnbhc domain annohahon tools, 
AAC74854.1 (ADS1) may not be annotated as an adhesron molecnle. Only rhe 
Inpharmatica Genome Threader is able to annotate this protein as an adhesion molecnle. 
The reverse search is now earned out. AAC74854.1 (ADS1) is now used as the query 
sequence in the Biopendium™ (see figure 6A). The Inpharmatica Genome Threader 
identifies AAC74854.1 (ADS1) as having a structure that is the same as Homo Sap,ens 
Leukocyte Function Antigen 1 with 95% confidence (see arrow in figure 6B). Homo 
Sapiens Leukocyte Function Antigen 1 (1LFA) was the origrna. query sequence. Pos.uve 
iterations of PSI-BIas. do no. return this result (Figure 6C). I. is only the Inpharmahca 
Genome Threader that is able to identify this relationship. 

The Homo Sapiens Leukocyte Function Antigen 1 sequence is chosen against which to 
vrew the sequence alignment of AAC74854.1 (ADS1). Viewing the AlEye alignment 
(Figure 7) of .he query pro.e.n agains, the protein .dentified as being of a stnu.ar 
structure helps to visualize the areas of homology. 

The Leukocyte Function Antigen 1 I domain requ.res a bound metal ion in order to 
function The metal ion forms a Metal lon-Dependen, Adhesion Site (MIDAS) which .s 
characterised by a MIDAS motif consisting of the conserved metal liganding restdues. 
The MIDAS motif in 1LFA:A consists of ASP10, SER12, SER14, THR79 and ASP112 
au these residues are conserved in AAC74854.1 (ADS1) as ASP256, SER258, SER260, 
„ THR3I5 and ASP346 respectively. The two serines and ASP112 are rhe meta. .on 
Uganda Tins indicates that AAC74854.1 (ADS1) is an adhesion molecule simrlar to 
Leukocyte Function Antigen 1. 

h, order to ensure Una. the protein identified is in fac. a relative of the query sequence, the 
visualization programs "LigEye" (F.gure 8A) and "IRasmol" (Figure 8B) are used. These 

n visualization tools identify the metal binding site of known protein structures by 
indicating the amino acids with which known metal .ons or small molecule inh.bt.ors 
interne, a. the active site. These interactions are through either a dtrec, hydrogen bond or 
through hydrophobic interactions. In .his manner one can see if the active sue 
fold/smrcture is conserved between the identified homologue and the chosen pro.etn of 

30 known structure. 
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Since the structure of Homo Sapiens Leukocyte Function Antigen 1 is known (1LFA), 
this is chosen to illustrate the MIDAS motif (Figure 8B). ASP10, SER12, SER14, THR79 
and ASP112 of 1LFA:A align with ASP256, SER258, SER260, THR315 and ASP346 of 
AAC74854.1. This indicates that indeed as predicted by the Inpharmatica Genome 
5 Threader, AAC74854.1 (ADS1) folds in a similar manner to Homo Sapiens Leukocyte 
Function Antigen 1 and as such is identified as an adhesion molecule. 

Example 2:AAC76768.1 (ADS2) 

In order to initiate a search for novel, distantly related adhesion molecules, an archetypal 
family member is chosen, the Ldomain from Homo Sapiens Integrin Alpha 2 / Beta 1. 
10 More specifically, the search is initiated using a structure from the Protein Data Bank 
(PDB) which is operated by the Research Collaboratory for Structural Bioinformatics. 

The structure chosen is the I-domain from Homo Sapiens Integrin Alpha 2 / Beta 1 (PDB 
code 1 AOX; see Figure 9). 

A search of the Biopendium™ (using the Target Mining Interface) for relatives of 1 AOX 
15 takes place and returns 2394 Genome Threader results. The 2394 Genome Threader 
results include examples of typical adhesion molecules, such as Integrin alpha 11 (see 
arrow in Figure 10A). 

Among the known adhesion molecules appears a protein of apparently unknown function, 
AAC76768.1 (ADS2; see arrow in figure 10B). The Inpharmatica Genome Threader has 
20 identified a sequence, AAC76768.1 (ADS2), as having a structure similar to Homo 
Sapiens Integrin Alpha 2 / Beta 1, an adhesion molecule. The possession of a structure 
similar to an adhesion molecule suggests that AAC76768.1 (ADS2) functions as an 
adhesion molecule. The Genome Threader identifies this with 100% confidence. 

The search of the BiopendiumTM (using the Target Mining Interface) for homologues of 
25 1FBL also returns 24 Reverse PSI-Blast results. The Inpharmatica Reverse PSI-Blast 
identifies AAC76768.1 (ADS2) as being related in sequence to Homo Sapiens Integrin 
Alpha 2 / Beta 1, detected in the -4 iteration (see Figure 10B, circled). The possession of 
a sequence related to an adhesion molecule suggests that AAC76768.1 (ADS2) functions 
as an adhesion molecule. This second proprietary method result consolidates the Homo 



c 



55 



Sapiens Integrin Alpha 2 / Beta 1 structural relationship demonstrated with Genome 
Threader. 

The search of the Biopendium™ (using the Target Mining Interface) for relatives of 
1AOX also returns 608 Forward PSI-Blast results. Forward PSI-Blast (see figure 10C) is 
unable to identify this relationship; only the Inpharmatica Genome Threader is able to 
identify AAC76768.1 (ADS 2) as an adhesion molecule. 

In order to assess what is known in the public domain databases about AAC76768.1 
(ADS2) the Redundant Sequence Display Page (Figure 11) is viewed. There are no 
associated PROSITE or PRINTS hits for AAC76768.1 (ADS2). PROSITE and PRINTS 
are databases that help to describe proteins of similar families. Returning no hits from 
both databases means that AAC76768.1 (ADS2) is unidentifiable as an adhesion 
molecule using PROSITE or PRINTS. The redundant sequence display also shows any 
predicted features of AAC50543.1 (CCS5). These include a potential coiled coil in the 
sequence. 

In order to identify if any other public domain annotation vehicle is able to annotate 
AAC76768.1 (ADS2) as an adhesion molecule, the AAC76768.1 (ADS 2) protein 
sequence is searched against the PFAM database (Protein Family Database of Alignment 
and hidden Markov models) (see Figure 12). The results identifies one PFAM-B match to 
AAC76768.1, however PFAM-B matches confer no functional annotation, only sequence 
similarity to other functionally unannotated proteins. Thus PFAM does not identify 
AAC76768.1 (ADS2) as an adhesion molecule. 

The National Center for Biotechnology Information (NCBI) Genbank protein database is 
then viewed to examine if there is any further information that is known in the public 
domain relating to AAC76768.1 (ADS 2). This is the U.S. public domain database for 

25 protein and gene sequence deposition (Figure 13). AAC76768.1 (ADS2) is a Homo 
Sapiens sequence, its Genbank protein ID is AAC76768.1 and it is 427 amino acids in 
length. AAC76768.1 (ADS2) was cloned by a group of scientists at the University of 
Wisconssin, USA. The entry identifies AAC76768.1 (ADS2) as a hypothetical protein. 
The public domain information for this gene does not annotate it as an adhesion 

30 molecule. 
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Therefore, it can be concluded that using all public domain annotation tools, 
AAC76768.1 (ADS2) may not be annotated as an adhesion molecule. Only the 
Inpharmatica Genome Threader is able to annotate this protein as an adhesion molecule. 

The reverse search is now carried out. AAC76768.1 (ADS2) is now used as the query 
5 sequence in the Biopendium™ (see figure 14A). The Inpharmatica Genome Threader 
identifies AAC76768.1 (ADS2) as having a structure that is the same as Homo Sapiens 
Integrin Alpha 2 / Beta 1 with 100% confidence (see arrow in figure 14B). Homo Sapiens 
Integrin Alpha 2 / Beta 1 (1 AOX) was the original query sequence. The first 3 iterations 
of positive PSI-Blast do not return this result (Figure 14C), adhesion molecules are only 
10 detected at and above iteration 4. It is only the Inpharmatica Genome Threader that is 
able to identify this relationship. 

The Homo Sapiens Integrin Alpha 2 / Beta 1 sequence is chosen against which to view 
the sequence alignment of AAC76768.1 (ADS2). Viewing the AlEye alignment (Figure 
15) of the query protein against the protein identified as being of a similar structure helps 
15 to visualize the areas of homology. 

The Integrin Alpha 2 / Beta 1 I domain requires a bound metal ion in order to function. 
The metal ion forms a Metal Ion-Dependent Adhesion Site (MIDAS) which is 
characterised by a MIDAS motif consisting of the conserved metal liganding residues. 
The MIDAS motif in 1AOX consists of ASP13, SER15, SER17, THR83 and ASP116 all 
20 these residues are conserved in AAC76768.1 (ADS2) as ASP271, SER273, SER275, 
THR337 and ASP365 respectively. The two serines and ASP116 are the metal ion 
ligands. This indicates that AAC76768.1 (ADS2) is an adhesion molecule similar to 
Integrin Alpha 2 / Beta 1. 

In order to ensure that the protein identified is in fact a relative of the query sequence, the 
25 visualization programs "LigEye" (Figure 16 A) and "iRasmol" (Figure 16B) are used. 
These visualization tools identify the metal binding site of known protein structures by 
indicating the amino acids with which known metal ions or small molecule inhibitors 
interact at the active site. These interactions are through either a direct hydrogen bond or 
through hydrophobic interactions. In this manner one can see if the active site 
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fold/structure is conserved between the identified homologue and the chosen protein of 
known structure. 

Since the structure of Homo Sapiens Integrin Alpha 2 / Beta 1 is known (1 AOX), this is 
chosen to illustrate the MIDAS motif (Figure 16B). ASP13, SER15, SER17, THR83 and 
5 ASP116 of 1AOX align with ASP271, SER273, SER275, THR337 and ASP365 of 
AAC76768.1. This indicates that indeed as predicted by the Inpharmatica Genome 
Threader, AAC76768.1 (ADS2) folds in a similar manner to Homo Sapiens Integrin 
Alpha 2 / Beta 1 and as such is identified as an adhesion molecule. 

F.vamnle 3:AAD05645.1 (ADS3) 

10 In order to initiate a search for novel, distantly related adhesion molecules, an archetypal 
family member is chosen, the I-domain from Homo Sapiens Integrin CR3. More 
specifically, the search is initiated using a structure from the Protein Data Bank (PDB) 
which is operated by the Research Collaborator for Structural Bioinformatics. 
The structure chosen is the I-domain from Homo Sapiens Integrin CR3 (PDB code 1JLM; 

15 see Figure 17). 

A search of the Biopendium™ (using the Target Mining Interface) for relatives of 1 JLM 
takes place and returns 2925 Genome Threader results. The 2925 Genome Threader 
results include examples of typical adhesion molecules, such as leukocyte integrin (see 
arrow in Figure 18A). 

20 Among the known adhesion molecules appears a protein of apparently unknown function, 
AAD05645.1 (ADS7; see arrow in figure 18B). The Inpharmatica Genome Threader has 
identified a sequence, AAD05645.1 (ADS3), as having a structure similar to Homo 
Sapiens Integrin CR3, an adhesion molecule. The possession of a structure similar to an 
adhesion molecule suggests that AAD05645.1 (ADS3) functions as an adhesion 

25 molecule. The Genome Threader identifies this with 89% confidence. 

The search of the Biopendium™ (using the Target Mining Interface) for relatives of 
1JLM also returns 626 Forward PSI-Blast results. Forward PSI-Blast (see figure 18C) is 
unable to identify this relationship; only the Inpharmatica Genome Threader is able to 
identify AAD05645.1 (ADS3) as an adhesion molecule. 
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In order to assess what is known in the public domain databases about AAD05645.1 
(ADS 3) the Redundant Sequence Display Page (Figure 19) is viewed. There are no 
associated PROSITE or PRINTS hits for AAD05645.1 (ADS3). PROSITE and PRINTS 
are databases that help to describe proteins of similar families. Returning no hits from 
5 both databases means that AAD05645.1 (ADS 3) is unidentifiable as an adhesion 
molecule using PROSITE or PRINTS. The redundant sequence display also shows any 
predicted features of AAC50543.1 (CCS 5). These include a potential coiled coil region in 
the middle of the sequence. 

In order to identify if any other public domain annotation vehicle is able to annotate 
10 AAD05645.1 (ADS 3) as an adhesion molecule, the AAD05645.1 (ADS3) protein 
sequence is searched against the PFAM database (Protein Family Database of Alignment 
and hidden Markov models) (see Figure 20 A). The results identifies two PFAM-B 
matches to AAD05645.1, however PFAM-B matches confer no functional annotation, 
only sequence similarity to other functionally unannotated proteins. The result also 
15 identifies one PFAM- A match, however this is to the FtsK/SpoIHE family (see Figure 
20B) which does not confer any particular functional annotation. Thus PFAM does not 
identify AAD05645.1 (ADS3) as an adhesion molecule. 

The National Center for Biotechnology Information (NCBI) Genbank protein database is 
then viewed to examine if there is any further information that is known in the public 

20 domain relating to AAD05645.1 (ADS 3). This is the U.S. public domain database for 
protein and gene sequence deposition (Figure 21). AAD05645.1 (ADS 3) is a Homo 
Sapiens sequence, its Genbank protein ID is AAD05645.1and it is 806 amino acids in 
length. AAD05645.1 (ADS3) was cloned by a group of scientists at the Astra Research 
Center Boston in Cambridge, USA. The entry identifies AAD05645.1 (ADS 3) as a 

25 putative protein. The public domain information for this gene does not annotate it as an 
adhesion molecule. 

Therefore, it can be concluded that using all public domain annotation tools, 
AAD05645.1 (ADS 3) may not be annotated as an adhesion molecule. Only the 
Inpharmatica Genome Threader is able to annotate this protein as an adhesion molecule. 
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The reverse search is now carried out. AAD05645.1 (ADS7) is now used as the query 
sequence in the Biopendium™ (see figure 22 A). The Inpharmatica Genome Threader 
identifies AAD05645.1 (ADS3) as having a structure that is the same as Homo Sapiens 
Integrin CR3 with 89% confidence (see arrow in figure 22B). Homo Sapiens Integrin 
5 CR3 (1JLM) was the original query sequence. Positive iterations of PSI-Blast do not 
return this result (Figure 22C). It is only the Inpharmatica Genome Threader that is able 
to identify this relationship. 

The Homo Sapiens Integrin CR3 sequence is chosen against which to view the sequence 
alignment of AAD05645.1 (ADS3). Viewing the AlEye alignment (Figure 23) of the 
10 query protein against the protein identified as being of a similar structure helps to 
visualize the areas of homology. 

The Integrin CR3 I domain requires a bound metal ion in order to function. The metal ion 
forms a Metal Ion-Dependent Adhesion Site (MIDAS) which is characterised by a 
MIDAS motif consisting of the conserved metal liganding residues. The MIDAS motif in 
15 1JLM consists of ASP9, SER11, SER13, THR78 and ASP111 all these residues are 
conserved in AAD05645.1 (ADS3) as ASP350, SER352, SER354, THR415 and ASP464 
respectively. The two serines and ASP111 are the metal ion ligands. This indicates that 
AAD05645.1 (ADS3) is an adhesion molecule similar to Integrin CR3. 

In order to ensure that the protein identified is in fact a relative of the query sequence, the 
20 visualization programs "LigEye" (Figure 24A) and "iRasmol" (Figure 24B) are used. 
These visualization tools identify the metal binding site of known protein structures by 
indicating the amino acids with which known metal ions or small molecule inhibitors 
interact at the active site. These interactions are through either a direct hydrogen bond or 
through hydrophobic interactions. In this manner one can see if the active site 
25 fold/structure is conserved between the identified homologue and the chosen protein of 
known structure. 

Since the structure of Homo Sapiens Integrin CR3 is known (1JLM), this is chosen to 
illustrate the MIDAS motif (Figure 24B). ASP9, SER1 1, SER13, THR78 and ASP1 1 1 of 
1JLM align with ASP350, SER352, SER354, THR415 and ASP464 of AAD05645.1. 
30 This indicates that indeed as predicted by the Inpharmatica Genome Threader, 
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AAD05645.1 (ADS3) folds in a similar manner to Homo Sapiens Integrin CR3 and as 
such is identified as an adhesion molecule. 

Example 4:VP4 (ADS4) 

In order to initiate a search for novel, distantly related adhesion molecules, an archetypal 
family member is chosen, the I-domain from Homo Sapiens Integrin Alpha 1 / Beta 1. 
More specifically, the search is initiated using a structure from the Protein Data Bank 
(PDB) which is operated by the Research Collaboratory for Structural Bioinformatics. 

The structure chosen is the I-domain from Homo Sapiens Integrin Alpha 1 / Beta 1 (PDB 
code 1QC5; see Figure 25). 

A search of the Biopendium™ (using the Target Mining . Interface) for relatives of 
1QC5:A takes place and returns 3331 Genome Threader results. The 3331 Genome 
Threader results include examples of typical adhesion molecules, such as leukocyte 
integrin (see arrow in Figure 26 A). 

Among the known adhesion molecules appears a protein of apparently unknown function, 
VP4 (ADS4; see arrow in figure 26B). The Inpharmatica Genome Threader has identified 
a sequence, VP4 (ADS4), as having a structure similar to Homo Sapiens Integrin Alpha 1 
/ Beta 1, an adhesion molecule. The possession of a structure similar to an adhesion 
molecule suggests that VP4 (ADS4) functions as an adhesion molecule. The Genome 
Threader identifies this with 48% confidence. 

The search of the Biopendium™ (using the Target Mining Interface) for relatives of 
1QC5:A also returns 622 Forward PSI-Blast results. Forward PSI-Blast (see figure 26C) 
is unable to identify this relationship; only the Inpharmatica Genome Threader is able to 
identify VP4 (ADS4) as an adhesion molecule. 

In order to assess what is known in the public domain databases about VP4 (ADS4) the 
Redundant Sequence Display Page (Figure 27) is viewed. There are no associated 
PROSITE or PRINTS hits for VP4 (ADS4). PROSITE and PRINTS are databases that 
help to describe proteins of similar families. Returning no hits from both databases means 
that VP4 (ADS4) is unidentifiable as an adhesion molecule using PROSITE or PRINTS. 
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The redundant sequence display also shows any predicted features of CAA57766.1 
(ADS4). These include a potential coiled coil region at the start of the sequence. 

In order to identify if any other public domain annotation vehicle is able to annotate VP4 
(ADS4) as an adhesion molecule, the VP4 (ADS4) protein sequence is searched against 
5 the PFAM database (Protein Family Database of Alignment and hidden Markov models) 
(see Figure 28). The results identifies one PFAM-A match to VP4, however the PFAM-A 
match is to the VP4 domain, which only annotates the protein as a VP4 protein and 
confers no additional functional annotation. Thus PFAM does not identify VP4 (ADS4) 
as an adhesion molecule. 

10 The National Center for Biotechnology Information (NCBI) Genbank protein database is 
then viewed to examine if there is any further information that is known in the public 
domain relating to VP4 (ADS4). This is the U.S. public domain database for protein and 
gene sequence deposition (Figure 29). VP4 (ADS4) is a Human Rotavirus sequence, its 
Genbank protein ID is CAA57766.1 and it is 775 amino acids in length. VP4 (ADS4) 

15 was cloned by a group of scientists at the Indian Institute of Science, Bangalore, India. 
The entry identifies VP4 (ADS4) as a coat protein of Human Rotavirus. The public 
domain information for this gene does not annotate it as an adhesion molecule. 

Therefore, it can be concluded that using all public domain annotation tools, VP4 (ADS4) 
may not be annotated as an adhesion molecule. Only the Inpharmatica Genome Threader 
20 is able to annotate this protein as an adhesion molecule. 

The reverse search is now carried out. CAA57766.1 (ADS4) is now used as the query 
sequence in the Biopendium™ (see figure 30A). The Inpharmatica Genome Threader 
identifies VP4 (ADS4) as having a structure that is the same as Homo Sapiens Integrin 
Alpha 1 / Beta 1 with 48% confidence (see arrow in figure 30B). Homo Sapiens Integrin 
25 Alpha 1 / Beta 1 (1QC5) was the original query sequence. The Von Willebrand domains 
also hit by VP4 are also adhesion molecules with a very similar structure to the integrin I 
domain. Positive iterations of PSI-Blast do not return this result (Figure 30C). It is only 
the Inpharmatica Genome Threader that is able to identify this relationship. 

The Homo Sapiens Integrin Alpha 1 / Beta 1 sequence is chosen against which to view 
30 the sequence alignment of VP4 (ADS4). Viewing the AlEye alignment (Figure 31) of the 
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query protein against the protein identified as being of a similar structure helps to 
visualize the areas of homology. 

The Integrin Alpha 1 / Beta 1 I domain requires a bound metal ion in order to function. 
The metal ion forms a Metal Ion-Dependent Adhesion Site (MIDAS) which is 
characterised by a MIDAS motif consisting of the conserved metal liganding residues. 
The MIDAS motif in 1QC5:A consists of ASP11, SER13, SER15, THR81 and ASP114 
all these residues are conserved in VP4 (ADS4) as ASP592, SER594, SER596, THR654 
and GLU686 respectively. The change of ASP114 to GLU686 is a conservative change 
as both residues have acidic groups capable of binding metal. The two serines and 
ASP114 are the metal ion ligands. This indicates that VP4 (ADS4) is an adhesion 
molecule similar to Integrin Alpha 1 / Beta 1. 

In order to ensure that the protein identified is in fact a relative of the query sequence, the 
visualization programs "LigEye" (Figure 32A) and "iRasmol" (Figure 32B) are used. 
These visualization tools identify the metal binding site of known protein structures by 
indicating the amino acids with which known metal ions or small molecule inhibitors 
interact at the active site. These interactions are through either a direct hydrogen bond or 
through hydrophobic interactions. In this manner one can see if the active site 
fold/structure is conserved between the identified homologue and the chosen protein of 
known structure. 

Since the structure of Homo Sapiens Integrin Alpha 1 / Beta 1 is known (1QC5), this is 
chosen to illustrate the MIDAS motif (Figure 63B). of ASP11, SER13, SER15, THR81 
and ASP114 of 1QC5:A align with ASP592, SER594, SER596, THR654 and GLU686 of 
VP4. This indicates that indeed as predicted by the Inpharmatica Genome Threader, VP4 
(ADS4) folds in a similar manner to Homo Sapiens Integrin Alpha 1 / Beta 1 and as such 
is identified as an adhesion molecule. 

Example 5:Ro60 (ADS5) 

In order to initiate a search for novel, distantly related adhesion molecules, an archetypal 
family member is chosen, the I-domain from Homo Sapiens Integrin CR3. More 
specifically, the search is initiated using a structure from the Protein Data Bank (PDB) 
which is operated by the Research Collaboratory for Structural Bioinformatics. 
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The structure chosen is the I-domain from Homo Sapiens Integrin CR3 (PDB code 1JLM; 
see Figure 33). 

A search of the Biopendium™ (using the Target Mining Interface) for relatives of 1 JLM 
takes place and returns 2925 Genome Threader results. The 2925 Genome Threader 
results include examples of typical adhesion molecules, such as leukocyte integrin (see 
arrow in Figure 34A). 

Among the known adhesion molecules appears a protein of apparently unknown function, 
Ro60 (ADS5; see arrow in figure 34B). The Inpharmatica Genome Threader has 
identified a sequence, Ro60 (ADS 5), as having a structure similar to Homo Sapiens 
Integrin CR3, an adhesion molecule. The possession of a structure similar to an adhesion 
molecule suggests that Ro60 (ADS 5) functions as an adhesion molecule. The Genome 
Threader identifies this with 70% confidence. 

The search of the Biopendium™ (using the Target Mining Interface) for relatives of 
1JLM also returns 626 Forward PSI-Blast results. Forward PSI-Blast (see figure 34C) is 
unable to identify this relationship; only the Inpharmatica Genome Threader is able to 
identify Ro60 (ADS5) as an adhesion molecule. 

In order to assess what is known in the public domain databases about Ro60 (ADS5) the 
Redundant Sequence Display Page (Figure 35) is viewed. There are no associated 
PROSITE or PRINTS hits for Ro60 (ADS5). PROSITE and PRINTS are databases that 
help to describe proteins of similar families. Returning no hits from both databases means 
that Ro60 (ADS 5) is unidentifiable as an adhesion molecule using PROSITE or PRINTS. 
The redundant sequence display also shows any predicted features of Ro60 (ADS 5). 
These include a potential coiled coil region at the start of the sequence and a 
transmembrane region. Although transmembrane regions are not predictive of adhesion 
molecules, they are a common characteristic of adhesion molecules. Thus the possession 
of a transmembrane region consolidates the Inpharmatica Genome Threader annotation of 
Ro60 (ADS5) as an adhesion molecule. 

In order to identify if any other public domain annotation vehicle is able to annotate Ro60 
(ADS5) as an adhesion molecule, the Ro60 (ADS 5) protein sequence is searched against 
the PFAM database (Protein Family Database of Alignment and hidden Markov models) 
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(see Figure 36). The results identifies two PFAM-B matches to R06O, however PFAM-B 
matches confer no functional annotation, only sequence similarity to other functionally 
unannotated proteins. Thus PFAM does not identify R06O (ADS5) as an adhesion 
molecule. 

5 The National Center for Biotechnology Information (NCBI) Genbank protein database is 
then viewed to examine if there is any further information that is known in the public 
domain relating to R06O (ADS5). This is the U.S. public domain database for protein and 
gene sequence deposition (Figure 37). R06O (ADS5) is a Homo Sapiens sequence, its 
SWISS-PROT protein ID is P10155 and it is 538 amino acids in length. R06O (ADS5) 
10 was cloned by a group of scientists at the W.M. Keck Autoimmune Disease Center, 
California. The entry identifies R06O (ADS 5) as a RNA binding protein associated with 
the autoimmune disease: Sjogren's Syndrome. The public domain information for this 
gene does not annotate it as an adhesion molecule. 

Therefore, it can be concluded that using all public domain annotation tools, R06O 
15 (ADS 5) may not be annotated as an adhesion molecule. Only the Inpharmatica Genome 
Threader is able to annotate this protein as an adhesion molecule. 

The reverse search is now carried out. P10155 (ADS5) is now used as the query sequence 
in the Biopendium™ (see figure 38A). The Inpharmatica Genome Threader identifies 
R06O (ADS5) as having a structure that is the same as Homo Sapiens Integrin CR3 with 
20 70% confidence (see arrow in figure 38B). Homo Sapiens Integrin CR3 (1JLM) was the 
original query sequence. Positive iterations of PSI-Blast do not return this result (Figure 
38C). It is only the Inpharmatica Genome Threader that is able to identify this 
relationship. 

The Homo Sapiens Integrin CR3 sequence is chosen against which to view the sequence 
25 alignment of R06O (ADS5). Viewing the AlEye alignment (Figure 39) of the query 
protein against the protein identified as being of a similar structure helps to visualize the 
areas of homology. 

The Integrin CR3 I domain requires a bound metal ion in order to function. The metal ion 
forms a Metal Ion-Dependent Adhesion Site (MIDAS) which is characterised by a 
30 MIDAS motif consisting of the conserved metal liganding residues. The MIDAS motif in 
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1JLM consists of ASP9, SER11, SER13, THR78 and ASP111 all these residues except 
for THR78 are conserved in R06O (ADS 5) as ASP376, SER378, SER380, DLE440 and 
ASP469 respectively. The two serines and ASP111 are the metal ion ligands. This 
indicates that R06O (ADS5) is an adhesion molecule similar to Integrin CR3. 

5 In order to ensure that the protein identified is in fact a relative of the query sequence, the 
visualization programs "LigEye" (Figure 40A) and "iRasmol" (Figure 40B) are used. 
These visualization tools identify the metal binding site of known protein structures by 
indicating the amino acids with which known metal ions or small molecule inhibitors 
interact at the active site. These interactions are through either a direct hydrogen bond or 

10 through hydrophobic interactions. In this manner one can see if the active site 
fold/structure is conserved between the identified homologue and the chosen protein of 
known structure. 

Since the structure of Homo Sapiens Integrin CR3 is known (1JLM), this is chosen to 
illustrate the MIDAS motif (Figure 63B). ASP9, SER1 1, SER13, THR78 and ASP1 1 1 of 
15 ULM align with ASP376, SER378, SER380, ILE440 and ASP469 of R06O. This 
indicates that indeed as predicted by the Inpharmatica Genome Threader, R06O (ADS 5) 
folds in a similar manner to Homo Sapiens Integrin CR3 and as such is identified as an 
adhesion molecule. 

Reverse-maximised PSI-BLAST of R06O (ADS5) identifies Mus musculus, Xenopus 
20 Laevis and Caenorhabditis elegans homologs of R06O (ADS5) called AAF19049.1, 
AAC38001.1, and CAA98241.1 respectively. AAF19049.1 has 90.0% sequence identity 
to P10155 {Homo sapiens Ro60;ADS5), see Figure 38C. AAC38001.1 has 76.0% 
sequence identity to P10155 (Homo sapiens Ro60;ADS5), see Figure 38C. CAA98241.1 
has 36.0% sequence identity to P10155 (Homo sapiens Ro60;ADS5), see Figure 38C. 

25 P10155 (Homo sapiens Ro60;ADS5), called AAF 19049.1, AAC38001.1, and 
CAA98241.1 are aligned and viewed in AlEye (Figure 41). AlEye reveals that the 
predicted metal binding residues SER378, SER380 and ASP469 of P10155 (Homo 
sapiens Ro60;ADS5), are conserved in the Mus musculus (AAF19049.1), Xenopus Laevis 
(AAC38001.1) and Caenorhabditis elegans (CAA98241.1) DICE-1 homologs. Another 

30 predicted MIDAS residue of (Homo sapiens Ro60;ADS5), ASP376, is also conserved. 
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Residues, which are essential for the function of a protein, will be conserved in homologs 
of that protein. Thus the conservation of SER378, SER380 and ASP469 (residues which 
would be essential for the function of the I-domain) and ASP376 in the Mus musculus 
(AAF19049.1), Xenopus Laevis (AAC3800L1) and Caenorhabditis elegans 
(CAA98241.1) DICE-1 homologs strongly supports the annotation of P 10 155 {Homo 
sapiens Ro60;ADS5) as an adhesion molecule. 



67 



Sequence Listing 

SEQ ID NO: 1 Nucleotide coding sequence for AAC74854.1 (ADS1) protein 

1 atgacctggt ttattgaccg gcgtctgaac ggcaaaaaca aaagcatggt gaatcgccag 
61 cgttttttac gccgttataa agcgcaaatt aaacagtcga tctccgaggc cattaataag 
121 cgttcggtga ctgacgtcga cagcggcgaa tccgtatcca ttcccacgga agatattagc 
181 gaaccgatgt ttcatcaggg gcgtggcggt ctgcgccacc gcgtgcatcc gggcaatgac 
241 catttcgtcc agaacgaccg aattgaacgt ccccagggtg gcggcggagg ttccggcagt 
301 ggtcagggcc aggccagcca ggatggtgaa ggtcaggatg aatttgtctt tcagatttcg 
361 aaagatgagt atcttgatct gctctttgaa gatttggcct taccgaatct gaaacaaaac 
421 caacaacgcc agctgaccga atataaaacg catcgggcgg gttataccgc taacggcgtt 
481 ccggccaata tcagcgttgt gcgttcattg cagaactcac tggcgcgacg cacagccatg 
541 acggcaggca agcggcggga acttcatgca ctggaagaga atttggccat catcagcaac 
601 agtgaacctg cgcaactgct ggaagaggaa cgtctgcgca aagaaattgc agaattacgt 
661 gccaaaattg aacgcgtccc ttttattgac accttcgatt tacgttacaa gaactacgag 
721 aagcggcccg atccctccag ccaggcagtg atgttttgcc tgatggacgt ttccggttca 
7 81 atggatcaat ccactaaaga tatggctaag cgtttttata ttctgctgta tctgttcctc 
841 agcagaacgt ataagaacgt ggaagtcgta tacatccgcc atcataccca ggcgaaagaa 
9 01 gtcgatgaac atgagttttt ctactcgcag gaaacaggcg gcaccattgt ttccagcgcc 
9 61 ctgaaactga tggatgaggt agtgaaagag cgttataacc cggcacagtg gaatatttac 
1021 gctgcacaag catcggacgg cgataactgg gccgatgact ctccgctttg ccatgaaatc 
1081 ctggcgaaaa aattattacc tgttgttcgt tattacagct atatcgaaat tacccgtcgt 
1141 gcacatcaga cattgtggcg agaatatgag catctgcaat ctactttcga caactttgcg 
1201 atgcagcaca tccgcgacca ggatgatatt tatccggtgt tccgtgaact gtttcataaa 
1261 caaaatgcaa cagctaaagg ctaa 

SEQ ID NO: 2 Protein AAC74854.1; ADS1 

1 mtwfidrrln gknksmvnrq rflrrykaqi kqsiseaink rsvtdvdsge svsiptedis 
61 epmfhqgrgg lrhrvhpgnd hfvqndrier pqgggggsgs gqgqasqdge gqdefvfqis 
121 kdeyldllfe dlalpnlkqn qqrqlteykt hragytangv paniswrsl qnslarrtam 
181 tagkrrelha leenlaiisn sepaqlleee rlrkeiaelr akiervpfid tfdlryknye 
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241 krpdpssqav mfclmdvsgs mdqstkdmak rfyillylfl srtyknvew yirhhtqake 
3 01 vdeheffysq etggtivssa lklmdewke rynpaqwniy aaqasdgdnw addsplchei 
3 61 lakkllpwr yysyieitrr ahqtlwreye hlqstfdnfa mqhirdqddi ypvfrelfhk 
421 qnatakg 

5 SEQ ID NO: 3 Nucleotide coding sequence for AAC76768. 1 (ADS2) protein 

1 gtgcgcagtc ggctgaaaga tgcccgagtc ccgccggaac tcaccgaaga ggtgatgtgc 
61 tatcagcaaa gccagctcct ctccacgcca cagtttattg tgcagctacc acagatcctg 





121 


gacttactgc 


atcgtctgaa 


ttctccatgg 


gcagaacaag 


cccgacagtt 


ggttgatgct 




181 


aacagcacga 


tcacttcagc 


gttacacacg 


ctttttctcc 


agcgttggcg 


tttaagtctg 


10 


241 


atcgtgcaag 


caacgacgtt 


aaatcaacag 


ctattagaag 


aagaacgcga 


acaactgttg 




301 


agtgaagttc 


aggaacgcat 


gacgctgagc 


ggacaacttg 


aaccgattct 


cgcagataac 




361 


aatactgcag 


ctggtcgtct 


gtgggatatg 


agcgccggtc 


agcttaaacg 


tggcgactat 




421 


cagttgattg 


tgaaatacgg 


tgaatttctt 


aacgaacagc 


cggaactgaa 


acgcctggca 




481 


gagcagctgg 


ggcgttctcg 


ggaagccaaa 


tcaataccgc 


gcaacgatgc 


gcagatggaa 


15 


541 


accttccgca 


ccatggtgcg 


cgaaccggcg 


acggttcctg 


agcaggttga 


tggtctgcaa 




601 


caaagcgatg 


atattttacg 


tctcctgccg 


ccagaactgg 


cgacactagg 


gataacggaa 




661 


ctggagtatg 


agttttaccg 


tcggctggtg 


gaaaaacagt 


tgctcaccta 


tcgcctgcac 




721 


ggtgagtcgt 


ggcgtgaaaa 


agtgatcgaa 


cgtccggtgg 


tacataaaga 


ttacgatgaa 




781 


cagccgcgcg 


ggccgtttat 


tgtctgtgtg 


gatacttccg 


gctcaatggg 


cggctttaat 


20 


841 


gaacagtgtg 


cgaaagcgtt 


ctgcctggcc 


ttgatgcgca 


ttgctctcgc 


agaaaaccgg 




901 


cgctgctata 


ttatgctatt 


ttccaccgag 


atcgtccgtt 


atgagctttc 


aggcccacaa 




961 


ggcatcgaac 


aagcaatccg 


ttttttaagc 


cagcagtttc 


gtggcggcac 


cgatcttgcc 



1021 agttgttttc gcgccattat ggaacgcttg caaagcaggg aatggtttga tgccgatgcg 

1081 gtggtgattt ctgattttat cgctcagcgg ttgcctgacg acgtgacgag taaagtgaaa 

25 1141 gagctgcagc gggtacatca gcatcgcttt catgccgtgg cgatgtcggc acacggcaaa 

12 01 cccggcatca tgcgcatttt cgatcatatc tggcgctttg ataccgggat gcgaagccgc 
1261 ctgctcagac gctggcggcg ataa 
SEQ ID NO: 4 Protein AAC76768.1; ADS2 

1 mrsrlkdarv ppelteevmc yqqsqllstp qfivqlpqil dllhrlnspw aeqarqlvda 
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61 nstitsalht lflqrwrlsl ivqattlnqq lleeereqll sevqermtls gqlepiladn 
121 ntaagrlwdm sagqlkrgdy qlivkygefl neqpelkrla eqlgrsreak siprndaqme 
181 tfrtmvrepa tvpeqvdglq qsddilrllp pelatlgite leyefyrrlv ekqlltyrlh 
241 geswrekvie rpwhkdyde qprgpfivcv dtsgsmggfn eqcakafcla lmrialaenr 
301 rcyimlfste ivryelsgpq gieqairfls qqfrggtdla scfraimerl qsrewfdada 
3 61 wisdfiaqr lpddvtskvk elqrvhqhrf havamsahgk pgimrifdhi wrfdtgmrsr 
421 llrrwrr 

SEQ ID NO: 5 Nucleotide coding sequence for AAD05645.1 (ADS 3) protein 

1 atgattgaaa taagcgaatg gttgcaaaaa ctagacgatg ccttagataa agttgttgct 
61 aaaaaagagc cagagagttt tctcaagccg atcatttcac caatagagga ctaccaaaag 
121 agtgtcaggc aaattcaagc gcaattcaca gacgcgccga agttcaatga agagggtgct 
181 taccctcaat ttttaagctg tggtttattg caagttaggg gcaaaaatgg tgctaacatg 
241 gaatttttat tgcctaaagt ttatcctttc ccccctaaaa gcttgtatat agagcatgaa 
3 01 aaagacgggc agtttttgag agaaatgctc atgcgcttac tctccagcgc gcctttagtg 
3 61 caattggaag tgatcttaat tgatgcgttg agcttggggg gcattttcaa tctggccaga 
421 aggcttttag ataaaaacaa tgactttatt taccagcaaa ggattttgac cgaaagcaag 
481 gaaatagaag aagccctaaa gcatttgcat gaatatttaa aggttaattt gcaagaaaaa 
541 ttagccggtt ttagagattt tgtgcattat aatgaaaacg ccaaagactc cttgccttta 
601 aaagcgcttt ttttaagcgg ggtggatgct ttgagtaaag acgcgcttta ttatctagaa 
661 aagatcatgc gttttggctc taaaaatggg gttttgagct ttgtcaattt ggagagcgaa 
721 aaaaacaatc aatccgcaga agatttgaaa cgctatgcgg agttttttaa agacaggaca 
781 agttttgagt gcttaaaata ccttaatgta gaaatcatca gcgatcaagg tattaaatcc 
841 caacacatgc aagacttcgc tgataaaatt aaagcgtatt acaagcaaaa aaaagaagtt 
9 01 aaaagggagt tgaaggactt acaaagagac aaagaatttt ggactaaaag ctctcagcat 
9 61 gaagtaagcg tgccggtggg gtgggatatt aaccataagg aagtgtgttt taaaatcggt 
1021 aacgaacaaa accacacgct catttgcgac cacagcggga gtgggaaatc caatttcttg 
1081 catgtgttga ttcaaaatct agctttctac tacgatcccg atgaagtcca actctttttg 
1141 ttagactata aagagggggt ggaatttaac gcgtatgtag cagatcccgc tttagagcat 
12 01 gcgaggttgg tgagcgtggc gagttcaatc tcttatggca tcactttctt gaaatggctt 
12 61 tgtgatgaaa tgcaaaaaag agccgatcgg ttcaagcagt ttaatgtgaa agatttaagc 



c 



70 



1321 


gattaccgca 


aacatgaaaa 


aatgcccaga 


ctgatcgtgg 


tgattgatga 


atttcaggtg 


1381 


ctttttagcg 


ataataaatc 


cactaaagcg 


gtggaggggc 


atttaaacac 


cctgcttaaa 


1441 


aagggccgta 


gctatggggt 


gcatttggtt 


ttggccactc 


aaaccatgcg 


cggcactgac 


15 01 


attaatccaa 


gctttaaggc 


tcaaatcgcc 


aaccgcatcg 


ctttgcctat 


ggatgcagaa 


1561 


gacagcagta 


gtgttttggg 


cgatgatgcg 


gcttgtgaga 


ttcaaaaacc 


agaaggcatt 


1621 


ttcaacaaca 


acggagggaa 


tagaaaatac 


cacaccaaga 


tgagtgtccc 


taaagcccct 


1681 


gatgatttca 


aatcttttct 


cacaaaaata 


cacgctgaat 


ttaaccaaag 


aaatctcgca 


1741 


cccatagatc 


gtaaaatcta 


taatggcgag 


acacctttaa 


aaatgcccga 


cacccttaag 


1801 


gctaatgaaa 


tgcgtttgca 


tctgggcaaa 


aaagtggatt 


atgagcaaaa 


ggacctgata 


1861 


gtggagtttg 


aaagtaacga 


atcgcatttg 


ttggtggtga 


tccaagattt 


aaacgctcgc 


1921 


atcgctttaa 


tgaaactctt 


attccaaaac 


gttaagagcg 


cgaacaaaga 


attggttttt 


1981 


tgcaataaag 


aaaaacgctt 


gataaggtct 


tttgatgcac 


aaaaagaata 


cggcatcacg 


2041 


cctgtagaaa 


atattttaag 


cgttttagac 


accgctatga 


atcctaacag 


cgcgcttgtg 


2101 


atagacaatc 


tcaacgaagc 


gaaagaattg 


cacgacaaag 


taggggcgga 


aaagttaaaa 


2161 


tcgtttttag 


aaaaagccat 


agacaacgag 


cagtattgcg 


tcatttttgc 


gcatgacttt 


2221 


aggcaaatta 


aaactaatta 


ccattttgac 


aagttaaaag 


aattgttaaa 


caaccacttc 


2281 


aagcaatgcc 


tagcctttag 


gtgcaatggg 


gaaaacttga 


acgctatcaa 


aagcgatttg 


2341 


cctccaccaa 


gcaaacttaa 


cgtgctattg 


atagagcttt 


ccaaagacag 


cgttactgaa 


2401 


ttcaggcctt 


tcagcttata 


g 









20 SEQ ID NO: 6 Protein AAD05645. 1 ; ADS 3 

1 mieisewlqk lddaldkwa kkepesflkp iispiedyqk svrqiqaqft dapkfneega 
61 ypqflscgll qvrgknganm efllpkvypf ppkslyiehe kdgqflreml mrllssaplv 
121 qlevilidal slggifnlar rlldknndfi yqqriltesk eieealkhlh eylkvnlqek 
181 lagfrdfvhy nenakdslpl kalflsgvda lskdalyyle kimrfgskng vlsfvnlese 

25 241 knnqsaedlk ryaeffkdrt sfeclkylnv eiisdqgiks qhmqdfadki kayykqkkev 
301 krelkdlqrd kefwtkssqh evsvpvgwdi nhkevcfkig neqnhtlicd hsgsgksnfl 
3 61 hvliqnlafy ydpdevqlfl Idykegvefn ayvadpaleh arlvsvassi sygitflkwl 
421 cdemqkradr fkqfnvkdls dyrkhekmpr liwidefqv lfsdnkstka veghlntllk 
481 kgrsygvhlv latqtmrgtd inpsfkaqia nrialpmdae dsssvlgdda aceiqkpegi 

30 541 fnnnggnrky htkmsvpkap ddfksfltki haefnqmla pidrkiynge tplkmpdtlk 
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601 anemrlhlgk kvdyeqkdli vefesneshl lwiqdlnar ialmkllfqn vksankelvf 
661 cnkekrlirs fdaqkeygit pvenilsvld tanmpnsalv idnlneakel hdkvgaeklk 
721 sflekaidne qycvifahdf rqiktnyhfd klkellnnhf kqclafrcng enlnaiksdl 
781 pppsklnvll ielskdsvte frpfsl 

SEQ ID NO: 7 Nucleotide coding sequence for VP4 (ADS4) protein 

1 ggctataaaa tggcttcgct catttataga caacttctca ctaattcata ttcggtagac 
61 ttgtatgacg aaatagaaca gattggatcg gagaaaactc aaaatgtgac gataaatcca 
121 ggcccttttg cacagactag atatgctcca gttgattggg gacacggaga gattaatgat 
181 tcaactacag tggaaccagt tttagatggt ccgtatcaac ccactacatt caaaccaccc 

2 41 aatgattatt ggctgcttat tagctcaaat acagatggag tagtctatga gagtacaaat 

3 01 aatagtgact tttggacagc agttatcgct gtcgaaccac atgttagtca aacaaatagg 
3 61 caatatgttt tatttggtga gaataagcag tttaacgtag aaaataattc agataaatgg 
421 aaatttttcg aaatgtttaa aggtagtagt cagagtgatt tttctaatag acggactcta 
481 acctctaata atagacttgt aggaatgcta aaatatggtg gaagagtatg gacgtttcat 
541 ggtgaaacac caagagctac tactgatagt tcgaatactg cggatttaaa taatatatca 
601 attataattc attcagagtt ttatattatt ccaagatccc aagaatctaa gtgtaatgaa 
661 tatattaata atggtttacc accaattcaa aatactagaa atgtagttcc attatctcta 
721 tcatccagat ctattcaata taggagagca caagttaatg aagatattac aatctcaaaa 
7 81 acttcattat ggaaggaaat gcaatataat agagatatta taataagatt taaatttggt 
841 aatagtgtca taaaactagg aggattgggg tataaatggt ctgaaatatc atacaaagca 
901 gcgaattatc aatatagcta ttcacgtgat ggtgaacaag ttactgcgca taccacctgt 
961 tcagtaaatg gagtaaataa ttttagctat aatggaggct cgctacctac tgatttcagt 
1021 atttcgagat atgaagttat taaaggaatt cattatgtat atatagacta ctgggatgat 
1081 tcaaaagcat tcagaaatat ggtatatgtt agatcgttag cagcaaattt gaattcagtg 
1141 aaatgtgtag gtgggagtta tgattttaga ttacctgtag gtgaatggcc tattatgaat 
1201 ggcggtgctg tatcattaca ttttgctgga gtgacattat ctacacagtt cactgatttt 

12 61 gtatcattga attcattacg atttagattc agtttaacag tagatgaacc atctttctca 
1321 ataatacgaa cacgtacaat gaatttatac ggattaccag cagctaatcc aaacaatgga 

13 81 aatgaatact atgaagtgtc aggaaggttt tcacttattt ctttagttcc aaccattgat 
1441 gattatcaaa ctccaattat gaattcagta acagtaaggc aagatttaga acgccagctt 
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1501 


aatgatttac 


gagaagaatt 


taattcattg 


tcacaagaaa 


tagctatgtc 


acaattaatt 


1561 


gatttagcat 


tattaccttt 


agacatgttt 


tctatgtttt 


cggggataaa 


aagcactatt 


1621 


gatctgacca 


agtcaatggc 


aactagtgta 


atgaaaaaat 


ttagaaaatc 


aaaattagcc 


1681 


acatcaattt 


cagaaatgac 


taattcattg 


tcagatgcgg 


cttcgtcagc 


atcaagaagt 


1741 


gtttctatta 


gatcaaattt 


ttcgacgatt 


tcaaattggt 


ctgatgcttc 


aaaaagtgtg 


1801 


ttaaatgtaa 


ctgactcagt 


aaatgatgtt 


tcaacacaaa 


cttctataat 


tagtaagaaa 


1861 


cttagattaa 


gagagatgat 


tactcaaact 


gaaggaatta 


gttttgacga 


tatttcagca 


1921 


gctgtactga 


aaacgaaaat 


agatatgtcc 


acacaaattg 


gaaaaaatac 


cttacctgat 


1981 


gtagttactg 


aagactctga 


aaagtttatt 


ccaaaacgat 


cgtatcgagt 


attaaaagat 


2041 


aatgaagtga 


tggaaattaa 


cactgaagga 


aagttttttg 


cttataaagt 


ggatacactc 


2101 


aatgagatcc 


cattcgatat 


aaataaattc 


gcggaacttg 


taacggattc 


tccagttata 


2161 


tcagcgataa 


tagactttaa 


gcgattaaaa 


aatttaaacg 


ataattatgg 


aattactcgc 


2221 


atagaagcgc 


ttaatttaat 


taaatcgaat 


ccgaatgtac 


tacgtaattt 


tattaatcaa 


2281 


aataatccaa 


ttataagaaa 


tagaattgag 


cagttaattc 


tacaatgtaa 


attgtgagaa 


2341 


tgtcattgag 


gatgtgacc 











SEQ ID NO: 8 Protein VP4; ADS4 

1 rnasliyrqll tnsysvdlyd eieqigsekt qnvtinpgpf aqtryapvdw ghgeindstt 
61 vepvldgpyq pttfkppndy wllissntdg wyestnnsd fwtaviavep hvsqtnrqyv 
121 lfgenkqfnv ennsdkwkff emfkgssqsd fsnrrtltsn nrlvgmlkyg grvwtfhget 
181 prattdssnt adlnnisiii hsefyiiprs qeskcneyin nglppiqntr nwplslssr 
241 siqyrraqvn editisktsl wkemqynrdi iirfkfgnsv iklgglgykw seisykaany 
3 01 qysysrdgeq vtahttcsvn gvnnfsyngg slptdfsisr yevikgihyv yidywddska 
3 61 frnmvyvrsl aanlnsvkcv ggsydfrlpv gewpimngga vslhfagvtl stqftdfvsl 
421 nslrfrfslt vdepsfsiir trtmnlyglp aanpnngney yevsgrfsli slvptiddyq 
481 tpimnsvtvr qdlerqlndl reefnslsqe iamsqlidla llpldmfsmf sgikstidlt 
541 ksmatsvmkk frksklatsi semtnslsda assasrsvsi rsnfstisnw sdasksvlnv 
601 tdsvndvstq tsiiskklrl remitqtegi sfddisaavl ktkidmstqi gkntlpdwt 
661 edsekfipkr syrvlkdnev meintegkff aykvdtlnei pfdinkfael vtdspvisai 
721 idfkrlknln dnygitriea Inliksnpnv lrnfinqnnp iimrieqli lqckl 
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SEQ ID NO: 9 Nucleotide coding sequence for R06O (ADS5) protein 
1 attttgcctt tttgttaggt ttcctaaaga caaaaaaaaa tggaggaatc tgtaaaccaa 
61 atgcagccac tgaatgagaa gcagatagcc aattctcagg atggatatgt atggcaagtc 
121 actgacatga atcgactaca ccggttctta tgtttcggtt ctgaaggtgg gacttattat 
181 atcaaagaac agaagttggg ccttgaaaat gctgaagctt taattagatt gattgaagat 
241 ggcagaggat gtgaagtgat acaagaaata aagtcattta gtcaagaagg cagaaccaca 
301 aagcaagagc ctatgctctt tgcacttgcc atttgttccc agtgctccga cataagcaca 
3 61 aaacaagcag catttaaagc tgtttctgaa gtttgtcgca ttcctaccca tctctttact 
421 tttatccagt ttaagaaaga tctgaaggaa agcatgaaat gtggcatgtg gggtcgtgcc 
481 ctccggaagg ctatagcgga ctggtacaat gagaaaggtg gcatggccct tgctctggca 
541 gttacaaaat ataaacagag aaatggctgg tctcacaaag atctattaag attgtcacat 
601 cttaaacctt ccagtgaagg acttgcaatt gtgaccaaat atattacaaa gggctggaaa 
661 gaagttcatg aattgtataa agaaaaagca ctctctgtgg agactgaaaa attattaaag 
721 tatctggagg ctgtagagaa agtgaagcgc acaaaagatg agctagaagt cattcatcta 
781 atagaagaac atagattagt tagagaacat cttttaacaa atcacttaaa gtctaaagag 
841 gtatggaagg ctttgttaca agaaatgccg cttactgcat tactaaggaa tctaggaaag 
901 atgactgcta attcagtact tgaaccagga aattcagaag tatctttagt atgtgaaaaa 
961 ctgtgtaatg aaaaactatt aaaaaaggct cgtatacatc catttcatat tttgatcgca 
1021 ttagaaactt acaagacagg tcatggtctc agagggaaac tgaagtggcg ccctgatgaa 
1081 gaaattttga aagcattgga tgctgctttt tataaaacat ttaagacagt tgaaccaact 
1141 ggaaaacgtt tcttactagc tgttgatgtc agtgcttcta tgaaccaaag agttttgggt 
1201 agtatactca acgctagtac agttgctgca gcaatgtgca tggttgtcac acgaacagaa 
1261 aaagattctt atgtagttgc tttttccgat gaaatggtac catgtccagt gactacagat 
1321 atgaccttac aacaggtttt aatggctatg agtcagatcc cagcaggtgg aactgattgc 
1381 tctcttccaa tgatctgggc tcagaagaca aacacacctg ctgatgtctt cattgtattc 
1441 actgataatg agacctttgc tggaggtgtc catcctgcta ttgctctgag ggagtatcga 
1501 aagaaaatgg atattccagc taaattgatt gtttgtggaa tgacatcaaa tggtttcacc 
1561 attgcagacc cagatgatag aggcatgttg gatatgtgcg gctttgatac tggagctctg 
1621 gatgtaattc gaaatttcac attagatatg atttaaccat aagcagcagc acgatccaga 
) 1681 gatccattgc catcagtgat ctcactaaaa aatatacagc tacttcccag ctaatctcca 
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1741 cccaatgaat gatgatggta tagtatgtgc ataatggaaa gttaccttac tg 
SEQ ID NO: 10 Protein Ro60; (ADS5) 

1 meesvnqmqp lnekqiansq dgyvwqvtdm nrlhrflcfg seggtyyike qklglenaea 
61 lirliedgrg ceviqeiksf sqegrttkqe pmlfalaics qcsdistkqa afkavsevcr 
121 ipthlftfiq fkkdlkesmk cgmwgralrk aiadwynekg gmalalavtk ykqrngwshk 
181 dllrlshlkp sseglaivtk yitkgwkevh elykekalsv etekllkyle avekvkrtkd 
241 elevihliee hrlvrehllt nhlkskevwk allqemplta llrnlgkmta nsvlepgnse 
3 01 vslvceklcn ekllkkarih pfhilialet yktghglrgk lkwrpdeeil kaldaafykt 
361 fktveptgkr fllavdvsas mnqrvlgsil nastvaaamc mwtrtekds ywafsdemv 
421 pcpvttdmtl qqvlmamsqi paggtdcslp miwaqktntp advfivftdn etfaggvhpa 
481 ialreyrkkm dipaklivcg mtsngftiad pddrgmldmc gfdtgaldvi rnftldmi 
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CLAIMS 

LA polypeptide, which polypeptide: 

(i)has the amino acid sequence as recited in SEQ ID NO: 2, SEQ ID NO: 4, SEQ ID 
NO: 6, SEQ ID NO: 8 or SEQ ID NO: 10 ; 

5 (ii)is a fragment thereof having adhesion molecule activity or having an antigenic 

determinant in common with the polypeptide of (i); or 

(iii)is a functional equivalent of (i) or (ii). 

2. A polypeptide which is a functional equivalent according to claim l(iii), is homologous 

to the amino acid sequence as recited in SEQ ID NO: 2, SEQ ID NO: 4, SEQ ID NO: 
10 6, SEQ ID NO: 8 or SEQ ID NO: 10 , and has adhesion molecule activity. 

3. A fragment or functional equivalent according to claim 1 or claim 2, which has greater 

than 50% sequence identity with the amino acid sequence recited in SEQ ID NO: 2, 
SEQ ID NO: 4, SEQ ID NO: 6, SEQ ID NO: 8 or SEQ ID NO: 10 , or with active 
fragments thereof, preferably greater than 60%, 70%, 80%, 90%, 95%, 98% or 99% 
15 sequence identity. 

4. A functional equivalent according to any one of claims 1-3, which exhibits significant 

structural homology with a polypeptide having the amino acid sequence given in SEQ 
ED NO: 2, SEQ ID NO: 4, SEQ ID NO: 6, SEQ ID NO: 8 or SEQ ID NO: 10 . 

5. A fragment as recited in claim 1 or claim 3 having an antigenic determinant in common 
20 with the polypeptide of claim l(i), which consists of 7 or more (for example, 8, 10, 

12, 14, 16, 18, 20 or more) amino acid residues from the sequence of SEQ ID NO: 2, 
SEQ ID NO: 4, SEQ ID NO: 6, SEQ ID NO: 8 or SEQ ID NO: 10 . 

6. A purified nucleic acid molecule which encodes a polypeptide according to any one of 

the preceding claims. 

25 7. A purified nucleic acid molecule according to claim 6, which has the nucleic acid 
sequence as recited in SEQ ID NO: 1, SEQ ID NO: 3, SEQ ID NO: 5, SEQ ID NO: 7 
or SEQ ID NO: 9 or is a redundant equivalent or fragment thereof. 

8. A purified nucleic acid molecule which hybridizes under high stringency conditions 
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with a nucleic acid molecule according to claim 6 or claim 7. 

9. A vector comprising a nucleic acid molecule as recited in any one of claims 6-8. 

10. A host cell transformed with a vector according to claim 9. 

11. A ligand which binds specifically to, and which preferably inhibits the adhesion 
molecule activity of, a polypeptide according to any one of claims 1-5. 

12. A ligand according to claim 11, which is an antibody. 

13. A compound that either increases or decreases the level of expression or activity of a 
polypeptide according to any one of claims 1-5. 

14. A compound according to claim 13 that binds to a polypeptide according to any one of 
claims 1-5 without inducing any of the biological effects of the polypeptide. 

15. A compound according to claim 14, which is a natural or modified substrate, ligand, 
enzyme, receptor or structural or functional mimetic. 

16. A polypeptide according to any one of claim 1-5, a nucleic acid molecule according to 
any one of claims 6-8, a vector according to claim 9, a ligand according to claim 1 1 
or 12, or a compound according to any one of claims 13-15, for use in therapy or 
diagnosis of disease. 

17. A method of diagnosing a disease in a patient, comprising assessing the level of 
expression of a natural gene encoding a polypeptide according to any one of claim 1- 
5, or assessing the activity of a polypeptide according to any one of claim 1-5, in 
tissue from said patient and comparing said level of expression or activity to a control 
level, wherein a level that is different to said control level is indicative of disease. 

18. A method according to claim 17 that is carried out in vitro. 

19. A method according to claim 17 or claim 18, which comprises the steps of: (a) 
contacting a ligand according to claim 11 or claim 12 with a biological sample under 
conditions suitable for the formation of a ligand-polypeptide complex; and (b) 
detecting said complex. 

20. A method according to claim 17 or claim 18, comprising the steps of: 



(T a 

a) contacting a sample of tissue from the patient with a nucleic acid probe under 
stringent conditions that allow the formation of a hybrid complex between a nucleic 
acid molecule according to any one of claims 6-8 and the probe; 

b) contacting a control sample with said probe under the same conditions used in step a); 
5 and 

c) detecting the presence of hybrid complexes in said samples; wherein detection of 
levels of the hybrid complex in the patient sample that differ from levels of the hybrid 
complex in the control sample is indicative of disease. 

2LA method according to claim 17 or claim 18, comprising: 

10 a)contacting a sample of nucleic acid from tissue of the patient with a nucleic acid primer 
under stringent conditions that allow the formation of a hybrid complex between a 
nucleic acid molecule according to any one of claims 6-8 and the primer; 

b)contacting a control sample with said primer under the same conditions used in step a); 
and 

15 c)amplifying the sampled nucleic acid; and 

d) detecting the level of amplified nucleic acid from both patient and control samples; 

wherein detection of levels of the amplified nucleic acid in the patient sample that 
differ significantly from levels of the amplified nucleic acid in the control sample is 
indicative of disease. 

20 22. A method according to claim 17 or claim 18 comprising: 

a) obtaining a tissue sample from a patient being tested for disease; 

b) isolating a nucleic acid molecule according to any one of claims 6-8 from said tissue 

sample; and 

c) diagnosing the patient for disease by detecting the presence of a mutation which is 
25 associated with disease in the nucleic acid molecule as an indication of the disease. 

23. The method of claim 22, further comprising amplifying the nucleic acid molecule to 
form an amplified product and detecting the presence or absence of a mutation in the 
amplified product. 
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24. The method of either claim 22 or 23, wherein the presence or absence of the mutation 
in the patient is detected by contacting said nucleic acid molecule with a nucleic acid 
probe that hybridises to said nucleic acid molecule under stringent conditions to form 
a hybrid double-stranded molecule, the hybrid double-stranded molecule having an 
unhybridised portion of the nucleic acid probe strand at any portion corresponding to 
a mutation associated with disease; and detecting the presence or absence of an 
unhybridised portion of the probe strand as an indication of the presence or absence 
of a disease-associated mutation. 

25. A method according to any one of claims 17-24, wherein said disease is 
cardiovascular disease, cancer, asthma, chronic obstructive pulmonary disease 
(COPD), inflammatory disease or a bacterial infection. 

26. Use of a polypeptide according to any one of claims 1-5 as an adhesion molecule. 

27. A pharmaceutical composition comprising a polypeptide according to any one of 
claim 1-5, a nucleic acid molecule according to any one of claims 6-8, a vector 
according to claim 9, a ligand according to claim 1 1 or 12, or a compound according 
to any one of claims 13-15. 

28. A vaccine composition comprising a polypeptide according to any one of claims 1-5 
or a nucleic acid molecule according to any one of claims 6-8. 

29. A polypeptide according to any one of claims 1-5, a nucleic acid molecule according 
to any one of claims 6-8, a vector according to claim 9, a ligand according to claim 1 1 
or 12, a compound according to any one of claims 13-15, or a pharmaceutical 
composition according to claim 21, for use in the manufacture of a medicament for 
the treatment of cardiovascular disease, cancer, asthma, chronic obstructive 
pulmonary disease (COPD), inflammatory disease or a bacterial infection. 

30. A method of treating a disease in a patient, comprising administering to the patient a 
polypeptide according to any one of claims 1-5, a nucleic acid molecule according to 
any one of claims 6-8, a vector according to claim 9, a ligand according to claim 11 
or 12, a compound according to any one of claims 13-15, or a pharmaceutical 
composition according to claim 21. 
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31. A method according to claim 30, wherein, for diseases in which the expression of the 
natural gene or the activity of the polypeptide is lower in a diseased patient when 
compared to the level of expression or activity in a healthy patient, the polypeptide, 
nucleic acid molecule, vector, ligand, compound or composition administered to the 

5 patient is an agonist. 

32. A method according to claim 30, wherein, for diseases in which the expression of the 
natural gene or activity of the polypeptide is higher in a diseased patient when 
compared to the level of expression or activity in a healthy patient, the polypeptide, 
nucleic acid molecule, vector, ligand, compound or composition administered to the 

10 patient is an antagonist. 

3 3. A method of monitoring the therapeutic treatment of disease in a patient, comprising 
monitoring over a period of time the level of expression or activity of a polypeptide 
according to any one of claims 1-5, or the level of expression of a nucleic acid 
molecule according to any one of claims 6-8 in tissue from said patient, wherein 

15 altering said level of expression or activity over the period of time towards a control 

level is indicative of regression of said disease. 

34. A method for the identification of a compound that is effective in the treatment and/or 
diagnosis of disease, comprising contacting a polypeptide according to any one of 
claims 1-5, or a nucleic acid molecule according to any one of claims 6-8 with one or 

20 more compounds suspected of possessing binding affinity for said polypeptide or 

nucleic acid molecule, and selecting a compound that binds specifically to said 
nucleic acid molecule or polypeptide. 

3 5. A kit useful for diagnosing disease comprising a first container containing a nucleic 
acid probe that hybridises under stringent conditions with a nucleic acid molecule 

25 according to any one of claims 6-8; a second container containing primers useful for 

amplifying said nucleic acid molecule; and instructions for using the probe and 
primers for facilitating the diagnosis of disease. 

36. The kit of claim 35, further comprising a third container holding an agent for digesting 
unhybridised RNA. 
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37. A kit comprising an array of nucleic acid molecules, at least one of which is a nucleic 
acid molecule according to any one of claims 6-8. 

38. A kit comprising one or more antibodies that bind to a polypeptide as recited in any 
one of claims 1-5; and a reagent useful for the detection of a binding reaction between 
said antibody and said polypeptide. 

39. A transgenic or knockout non-human animal that has been transformed to express 
higher, lower or absent levels of a polypeptide according to any one of claims 1-5. 

40. A method for screening for a compound effective to treat disease, by contacting a non- 
human transgenic animal according to claim 39 with a candidate compound and 
determining the effect of the compound on the disease of the animal. 
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AAC74854 427 BCT 

orf, hypothetical protein [Escherichia coli Kl2]. 
AAC74854 
g!788084 

AAC74854. 1 GI : 1788084 

locus AE000273 accession AE00Q273. 1 

Escherichia coli Kl2. 
Escherichia coli K12 

Bacteria; proteobacteria; gamma subdivision; Enterobactenaceae; 
Escherichia. 

1 (residues 1 to 427) 

Blattner, F.R. , Plunkett, G . Ill, Bloch, C.A., Perna,N.T., Burland,V., 
Riley, M., Collado-Vides, J . , Glasner, J.D. , Rode,C.K., Mayhew, G.F., 
Gregor,J., Davis, N.W., Kirkpatrick, H. A. , Goeden,M.A., Rose, D. J., 
M au, B . and shao,Y. 

The complete genome sequence of Escherichia coli K-12 

science 277 (5331), 1453-1474 (1937) 

37426617 

9278503 

2 (residues 1 to 427) 
Blattner, F . R. 

Direct submission 

Submitted (16-JAN-1997) Gup Plunkett III, Laboratory of Genetics, 
University of Wisconsin, 445 Henry Mall, Madison, WI 53706, USA. 
Email: ec oli£gene tics. wise . edu Phone: 608-262-2534 Fax: 
608-263-7459 

3 (residues 1 to 427) 
Blattner, F .R. 

Direct Submission 

Submitted (02-SEP-1997) Guy Plunkett III, Laboratory of Genetics, 
university of Wisconsin, 445 Henry Mall, Madison, WI 53706, USA. 
Email- ecoli£genetics. wis c. edu Phone: 608-262-2534 Fax: 
608-263-7459 

4 (residues 1 to 427) 
Plunkett, G. III. 
Direct Submission 

Submitted (13-OCT-1998) Laboratory of Genetics, University of 
Wisconsin, 445 Henry Mall, Madison, WI 53706, USA 

This sequence was determined by the E. coli Genome Project at the 
University of Wisconsin-Madison (Frederick R. Blattner, director). 
Supported by NIH grants HG00301 and HG01428 (from the Human Genome 
Project and NCHGR). The entire sequence was independently 
determined from E. coli Kl2 strain MG1655. Predicted open reading 
frames were determined using GeneMark software, kindly supplied by 
Mark Borodovsky, Georgia Institute of Technology, Atlanta, GA, 
30332 [e-mail: mark@amber.gatech.edu]. open reading frames that 
have been correlated with genetic loci are being annotated with CG 
Site Hos., unique ID nos. for the genes in the E. coli Genetic 
Stock Center (cgsc) database at Yale university, kindly k supplied by 
Mary Berlyn. A public version of the database is accessible '<> *f}\ 
(http://cgsc.biology.yale.edu). Annotation of the genome is an 
ongoing task whose goal is to make the genome sequence more useful 
by correlating it with other data. Comments to the authors are 
appreciated. Updated information will be available at the E. coli 
Genome Project's World wide Web site 

(http://www.genetics.wisc.edu). *** The E. coli K12 sequence and 
its annotations are periodically updated; this is version M54. Ho 
sequence changes. Annotation updates: updated gene identifications 
and products; all new functional assignments courtesy of Monica 
Riley; added promoters, protein binding sites, and repeated 
sequences described in reference 1. The unique numeric identifiers 
beginning with a lowercase 'b' assigned to each gene (protein- or 
RNA- encoding) are now designated as gene synonyms instead of 
labels This should allow them to be searched for in Entrez as gene 
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Aligned annotation view for AAC76768.1 (doimlo&ding image...) 



2: P03S1S 

1. AAA82097.1 

Bsp: AAC767SS.1 
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Primary database information: 
S ec ondary datab as e inf ormation: 
Inpharmatica calculated information: 





iHPrmts matches ;: . .; 









■mis 




< Sanger 
^ Centre 



Lttp : //www. Sanger, ac . uk/cg i-bin/Pf am/nph- search, cgi 



Pfam 

Protein families database of alignments and HMMs 

Home 1 Kfewjoni sewth I Protein search 1 DMA se&reh I Brows* Ffan I Traonomy search I Hd 




Results for gi|2367274|^|AAC76768.1| 

There were no matches to Pf axn-A (including borderline matches) for gil2367274lgblAAC76768.1 1 

Matches to Pfam-B 



Domain jStart 


EndfEvahie iAhgmnent 


iPfam-B 15204 1204 


408 j2.4e-108 j Align 



[427 Msiduues] 



Alignments of Pfam-B domains to best-matching Pfam-B sequence 



Format for fetching alignments to Pfam-B families: i 



Query gil2367274lghlAAC76768.il/204-408 matching Pfam-B 15204 



YIEM ECOLI 204 DILRLLPPELATL GITELEYEFYRRL VEKQLLTYRLHGE 5WREKVIERP V 253 
DILPXLPPELATLGITELEYEFYP^VEKQLLTYPXHGESWREKVIERPV / 
gi| 23672741 gb|AAc76768.1| 204 DILPXLPPELATLGITELEYEFYPJU.VEKQLLTYPXKGESWREKVIERPV 253 

YIEM ECOLI 254 VHKDYDE QPRGPFI V C VDT S G SM G GFNE Q C AKAF CL ALMRI ALAENRRCY 303 
VHKDYDE QPRGPFI V CVDT S 6 SM G GFHE Q C AKAF CLALMRI ALAENRRCY 
gij 2367274| gb | AAC76768.1| 254 VHXDYDE QPRGPFI V CVDT SGSMGGFHEQC AKAF CLALMRIALAEHRRCY 303 

YIEM ECOLI 304 IMLF STEIVRYEL S GP Q GIE Q AIRFL S Q QFRG GTDL AS CFRAIMERL Q SR 353 
" IMLF STEIVRYEL S GP Q GIE Q AIRFL S Q QFRG GTDL AS CFRAIMERL Q SR 

gi| 2367274| gb| AAC76768.il 304 IMLF STEIVRYEL SGPQ GIE QAIRFLSQ QFRG GTDL AS CFRAIMERL QSR 353 

YIEM ECOLI 354 EWFDAD AV VI SDFI AQRLPDD VT SKVKEL QRVMQHRFMAVAM SAHGKP GI 403 
~ EWFD ADAV VI SDFI AQRLPDD VT SKVKEL Q RVH QKRF HAV AM SAHGKP GI 

gi| 2367274] gb| AAC76768.il 354 EWFD ADAVVI SDFI AQRLPDD VT SKVKEL QRVHQKRFHAV AM SAHGKP GI 403 

YIEM_ECOLI 404 MRIFD 408 
. MRIFD 

gi| 2367274| gb| AAC76768.il 404 MRIFD 408 



If you think there is anything wrong with this script, please contact Pfam 
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ijljhttp -. //www, ncbi. nlm. nih. gov: 80/entrez/query, f cgi?cmd=Retrieve&g^rote^ 



LOCUS 

DEFINITION 

ACCESSION 

PID 

VERSION 
DB SOURCE 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 
JOURNAL 
MEDLINE 
PUBMED 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

COMMENT 



gamma subdivision; Enterobacteriaceae; 



III, Bloch, C 

M Glasner, J.D 



A. , Perna,N.T., Burland,v., 
, Rode,C.K., Mayhew, G.F., 
Goeden,M.A., Rose, D. J., 

12 



AAC76768 427 aa BCT 01-DEC-2000 

orf, hypothetical protein [Escherichia coii K12J. 
AAC76768 
g236?274 

AAC76768. 1 GI: 2367274 
locus AE000451 accession AEQ00451. 1 

Escherichia coli K.12. 
Escherichia coli Kl2 
Bacteria; Proteobacteria; 
Escherichia. 

1 (residues 1 to 427) 
Blattner, F . R. , Plunk ett, G 
Riley , M . , collado-Vides, J 
Gregor,J., Davis, N.W., Kirkpatrick, H. A 
Mau, B . and Shao,V. 

The complete genome sequence of Escherichia coll K. 
science 277 (5331), 1453-1474 (1937) 
97426617 
927S503 

2 (residues 1 to 427) 
Blattner, F. R. 

Direct Submission 

Submitted (16-JAN-1997) Guy Plunkett III, Laboratory of Genetics, 
University of Wisconsin, 44 5 Henry Mall, Madison, WI 53706, USA. 
Email: ecoli£gene tics. wise. edu Phone: 608-262-2534 Fax: 
608-263-7459 

3 (residues 1 to 427) 
Blattner, F . R. 

Direct Submission , . 

Submitted (02-SEP-1997) Guy Plunkett III, Laboratory of Genetics, 
university of Wisconsin, 445 Henry Mall, Madison, WI 53706, USA. 
Email: ecoli£gene tics .wise. edu Phone: 608-262-2534 Fax: 
608-263-7459 

4 (residues 1 to 427) 
Plunkett, G. Ill- 
Direct submission 

Submitted (13-0CT-1998) Laboratory of Genetics, university of 
Wisconsin, 445 Henry Mall, Madison, WI 53706, USA 
This sequence was determined by the E. coli Genome Project at the 
university of Wisconsin-Madison (Frederick R. Blattner, director). 
Supported by NIH grants HG00301 and HG01428 (from the Human Genome 
Prone ct and NCHGR). The entire sequence was independently, 
determined from E . coli K12 strain MG1655. Predicted open reading 
frames were determined using GeneMark software, kindly supplied by 
Mark Borodovsky, Georgia Institute of Technology, Atlanta, GA, 
30332 [e-mail: markdamber.gatech.edu]. Open reading frames that 
have been correlated with genetic loci are being annotated with CG 
Site Nos unique ID nos. for the genes in the E. coli Genetic 
Stock Center (CGSC) database at Vale University, kindly supplied by 
Mary Berlyn. A public version of the database is accessible 
(http //cgsc. biology .yale. edu) . Annotation of the genome is an 
ongoing task whose goal is to make the genome sequence more useful 
by correlating it with other data. Comments to the authors are 
appreciated. Updated information will be available at the E. 
Genome Project's world Wide Web site 
(h t tp • //www. gene tics. wise, edu) . *** The E . coli Kl2 sequence and 
its annotations are periodically updated; this is version M54. No 
sequence changes. Annotation updates: updated gene identifications 
and products; all new functional assignments courtesy of Monica 
Riley- added promoters, protein binding sites, and repeated 
sequences described in reference 1. The unique numeric identifiers 
beginning with a lowercase 'b' assigned to each gene (protein- or 
RNA-encoding) are now designated as gene synonyms instead of 
labels. This should allow them to be searched for in Entrez as gene 
names . 
Method: 



are 
coli 



FEATURES 

source 



ORIGIN 



conceptual translation. 
Location/ Qualifiers 
1. .427 

/organism="Escherichia coli K12" 
/strain="Kl2" 
/sub_strain="MGl655" 
/db_xref = " taxon: 83333" 
Protein 1. .427 

/ f unc tion= " o rf ; unknown" 
/product^" orf, hypothetical protein" 
CDS 1..427 

/gene^'yieM 

/coded_by="complement(2367272: 5249. .6532)" 
/ trans 1_ table =11 

/note="f427; sequence change joins ORFs yieD and yieM from 
earlier version" 

1 mrsrlkdarv ppelteevmc yqqsqllstp qfivqlpqil dllhrlnspw aeqarqlvda 
61 nstitsaJht lflqrwrlsl ivqattlnqq lleeereqll sevqermtls gqlepiladn 
121 ntaagrlwdm sagqlkrgdy qlivkygefl neqpelkrla eqlgrsreak siprndaqme 
181 tfrtmvrepa tvpetf^g}^ qsddilrllpjpelatlgite leyefyrrlv ekqlltyrlh 



Accession Code Query 
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If you select e g "Matches detected during the first 3 iterations" these matches will be excluded from the report 
(usinq the first_PB_iter annotation). This allows you to focus on more remote homologous which have been 
detected after 4 or more PSI-BLAST iterations. Matches detected using PSI-BLAST with negative iterations or 
using Genome-Threader are not effected by this option. However, if one match is found during the first e.g. 3 
PSI-BLAST iterations and by Genome-Threader it will be excluded. 



Filter for the following SPECIES: 
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http : //london-bridge/inpharmatica/DEVF9/cgi-bin/r3p.pl/?&rep^esn^85522&F^ 



Aligned annotation view far AAD05645. 1 (downloading image...) 



Rep; 



"55 ^ ^5 2 00 250 3 0 0 350 40 0 450 5 0 0 5 5 0 600 650 7 00 7 5 0 600 

Primary database information: 
Secondary database information: 
Inpharmatica c alculated information: 




s 




^^^^^^j^^aif^nl[|ittp : //ww. Sanger, ac . uk/cgi-bin/Pf am/nph- search, cgi 



K bang 
t >j Cen 



Pfam 

Protein families database of alignments and HMMs 

Horn* I Ktywonl search I Protein sarah I DMA ;»arch I Browse Pfem I Tawnornvswch I H«lc 



Results for gi|4154569|gb|AAD05645.1| 

Trusted matches - domains scoring higher than the gathering threshold 



Domain jStart 


End 


Bits jEvalue 


■~ 

Alignment j 


IFtsK SpoillE 1303 


560 


208.60 &.4e-59 

: 


Align 



1 



I 



Matches to Pfam-B 



Domain [Start [End j E value 


Alignment 


iPfam-B 38662 S l302 i3e-15S 


Align 


iPfam-B 36020 §61 1806 &2e-129 


Align 







— r _ il' — -i*%$&abBmmm& 





[806 xesidwes] 



FtsK_SpoIIIE 303-560 



Format for fetching alignments to seed 



Alignments of Pfam -A domains to HMMs 

yp exte xOinke d'to s wis spf aag-J j: 



Ahgnment of FtsK SpoIIIE vs gil4154569lgblAAD05645.il/303-560 



gi|4l54569 
gi|4154S69 
gi|4154569 
gi|4154S69 
gi|4154569 
gi|4154569 



303 



- > e Ik e lldsk af rds rs rltialgk divs grip wadlvkmpvdagp r« 
ell£+l+++k+f++ +s+ ++ +++ +++++++V++ ++ ++ H 

ELRDL QRDKEFWTKS S QKEV SVP V CWDINHKEV CFKI GHE QH H 34 5 



iL iaG ate S GKSvf lnt Ills laarhsP e eV r lyt l±DpKgg . cL ap 1c dl 
Li+ +GSGXS+fl+ li++la++++P+eV+l+l+D+K+g e++++ 
346 TLICDKSGSGKSNFLWLIQNLAFYYDPDEVQLFLLDYKEGVEFNAYVAD 395 

phlls v . lvavatdp ek alralreLveEM e rRy e rallrqlgvrs ic . 
p l +++lv va+++ + + 1++L++EM++R + ++<r»-+v+++++y+ 
396 PM.EHAE1.V SVAS SI SY GITFL1OTL CDEM QKRA — DRFKQFHVKDL SDY r 443 

pne e iae vilggf gdvdlviimy deL ae enll, c rvtsLknqG Isy gvhvm. 
++ c+ +++ i ++++++++ l++D ri ++++ a + € + +++ t+Lk++++++++ v+ 
444 KHEKMPPXIVVIDEFQVLFSDKKSTKAVEG--HLNTLLKKGRSYGVHLVL 491 

atagrwle . c lpy iwiVD c racLmlaagk ds e JURr s rve dllaRlaqmk 
A t+++++++++p+ +++ + r*-+L ++a++ s ++ +d+ +++ + 
492 ATQTWRGTdlNPSFKAQIANRIALPMDilEDSS SVL GDDAACEI QKPE 538 

c e ltf ARlnlD daaG iHlllA tp e ldaqp dVvavrqRp sv< -* 
+ +n ++G ++t+++ + 
539 GIFH HNGGNRXYKTKMS VPKAP 560 
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PubMed Nucleotide Protein Genome Structure PopSet Taxonomy OMIM 




□ 1: GI "4154569" [GenPept] putative [Helicobacter pylo... BLink, PubMed, Related Sequences, Nucleotide, Genome, Taxono 



LOCUS 

DEFINITION 
AC CESSION 
PID 

VERSION 
DB SOURCE 
KEYWORDS 
SOURCE 

ORGANISM 



AAD05645 806 

putative [Helicobacter pylori J 99]. 

AAD05645 

g4154569 

AAD05645. 1 GI: 4154569 

locus AE001445 accession AE001445 . 1 



BCT 



20-JAN-1999 



REFERENCE 
AUTHORS 



TITLE 

JOURNAL 
MEDLINE 
REMARK 

REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

COMMENT 
FEATURES 

source 



ORIGIN 



// 



Helicobacter pylori J99. 
Helicobacter pylori J 99 

Bacteria; proteobacteria; epsilon subdivision; Helicobacter group; 
Helicobacter. 

1 (residues 1 to 806) 

Aim,R.A., Ling,L.S., Moir,D.T., King,B.L., Brown, E.D., Doig,P.C, 
Smith, D.R., Noonan,B., Guild, B . c . , deJonge, B . L . , Carmel, G . , 
Tummino,P. J. , Caruso, A. , Uria-Nickelsen, M. , Mills, D.M., Ives,C, 
Gibson, R. , Merberg,D., Mills, S.D., Jiang, Q., Taylor, D.E., 
Vovis,G.F. and Trust, T.J. 

G enomic - s e quenc e comparison of two unrelated isolates of the human 
gastric pathogen Helicobacter pylori 
Nature 397 (6715), 176-180 (1999) 
99120557 

Erratum- [ (published erratum appears in Nature 1999 Feb 
25;397(6721):719]] 

2 (residues 1 to 806) 
King,B.L., Aim, R. A. and Trust, T.J. 
Direct submission 

Submitted (12-JAN-1999) Astra Research Center Boston, 128 Sidney 
Street, Cambridge, MA 02139, .USA . * * y -.j*. 
Method: conceptual translation supplied by! author. - \' A 
L o c ation/ Qualif ie rs 
1. . 806 

/organism=" Helicobacter pylori J99" 
/strain="J99" 
/db_xref = " tax on: 85963" 
1. . 806 

/p ro due t= " putative " 
1..806 

/gene= M jhp0061" 

/coded_by="4154559: 5018. .7438" 
/transl table=ll 

/no te = " similar to H. pylori 26695 gene HP 00 66" 

1 mieisewlqk lddaldkwa kkepesflkp iispiedyqk svrqiqaqft dapkfneega 
61 ypqflscgll qyrgknganm efllpkvypf ppkslyiehe kdgqflreml mrllssaplv 
121 qlevilidal slggifnlar rlldknndfi yqqriltesk eieealkhlh eylkvnlqek 
181 lagfrdfvhy nenakdslpl kalflsgvda lskdalyyle kimrfgskng vlsfvnlese 
241 knnqsaedlk ryaeffkdrt sfeclkylnv eiisdqgiks qhmqdfadki kayykqkkev 
301 krelkdlqrd kefwtkssqh evsvpvgwdi nhkevcfkig neqnhtlicd hsgsgksnfl 
361 hvliqnlafy ydpdevqlfl ldykegvefn ayvadpaleh arlvsvassi sygitflkwl 
421 cdemqkradr fkqfnvkdls dyrkhekmpr liwidefqv lfsdnkstka veghlntllk 
481 kgrsygvhlv latqtmrgtd inpsfkaqia nrialpmdae dsssvlgdda aceiqkpegi 
541 fnnnggnrky htkmsvpkap ddfksfltki haefnqrnla pidrkiynge tplkmpdtlk 
601 anemrlhlgk kvdyeqkdli vefesneshl lwiqdlnar ialmkllfqn vksankelvf 
661 cnkekrlirs fdaqkeygit pvenilsvld tamnpnsalv idnlneakel hdkvgaeklk 
721 sflekaidne qycvifahdf rqiktnyhf d klkellnnhf kqclafrcng enlnaiksdl 
781 pppsklnvll ielskdsvte frpfsl 
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Target Mining 
Interface 




attca 




1 QC5 bnd chain (e.g. B): J A j 



• Enter PDB accession number (e.g. 1 QMA): 

OR '' " ' • u 

• Enter one Swiss- Prot accession (e.g. P27504) or Gen Bank protein ID (e.g. CAB08761.1): 




Iteration Filter: PS I- BLAST matches to be excluded: 




If you select e.g. "Matches detected during the first 3 iterations" these matches will be excluded from the report 
(using the first_PB_iter annotation). This allows you to focus on more remote homologous which have been 
detected after 4 or more PSI-BLAST iterations. Matches detected using PSI- BLAST with negative iterations or 
using Genome-Threader are not effected by this option. However, if one match is found during the first e.g. 3 
PSI-BLAST iterations and by Genome-Threader it will be excluded. 





2) 79 additional hits identified by both, Genome Threader and PS I- BLAST : 

Combined. Genome Threader and PSI - Blast output : PSI - BLAST vaJirei are J ho vn in maroon! 



Artd2list 


BPD links 


WAY link 


Title 


Organism 


Div. 


%ID 
<GT,PSI> 


Query 
rgn. 
(GT,PSI) 


Target rgn. 
(GT,PSI) 


AJn. 
score 
(GT) 


Conf.(GT) 


1st 
Iter. 
(PSI) 


□ 


P56199 
drill through 
Too506la3tHits 


| P56199 


INTEGRIN ALPHA-1 (LAMINJN 
AND COLLAGEN RECEPTOR) 
(VLA-1 ) (UU4yA;. 


Homo 
sapiens 
(Human). 


PRI 


99%, 93% 
unmasked SW 


1-192, 
1-192 


140-331. 
140-331 

' - 


488 


100% 
unmasked GT 


1 


Red. Sea .View \ 






□ 


AAF01 258.1 1 
drill throuqh 
TodSO Blast Hits IAAF01258.1 


integrin alpha- 1 1 subunit 
precursor 


Homo 
sapiens 


PRI 


57.7%. 57% 


4-192, 
2-192 


163-349, 
161-349 


462 


100% 


1 


unmasked SW 


unmasked GT 


Red. Sea. View ; 






□ 


AAD51919.2 ! 
: drill throuah \ 
TOD50 Blast Hits iAAD51919.2 


integrin alpha 1 1 subunit precursor 


Homo 
sapiens 


PRI 


57.7%. 57% 


4-192, 
2-192 


163-349, 
161-349 


462 


100% 


1 


unma3ked SW 


unmasked GT 


Red. Sea .View \ 






PI 7301 
drill throuah 

TQpRnRI»*tHit* 1 P 17301 


PLATELET MEMBRANE 
GLYCOPROTEIN IA 
PRECURSOR (GPIA) 
(L.ULLAL»tN ntLtrlUn; 
(INTEGRIN ALPHA-2)(VLA-2 
ALPHA CHAIN) (CD49B). 


Homo 
sapiens 
(Human). 


PRI 


52.9%. 52% 


4-192, 
•4-1 92 


173-361, 
1 73-361 


444 


100% 


1 


Red. Sea .View \ 


Ui irnoumcu o yy 


UI 11 1 fOOf>C-CJ U 1 


□ 


Q99715 ! 
drill throuah 1 
Tod50 Blast Hits ! 099715 


COLLAGEN ALPHA 1(Xll) 
CHAIN PRECURSOR. 


Homo 
sapiens 
(Human). 


'• fv 

PRI 


345%. 30%" 


1- 192/ 

2- 1S7 


136-316, 
2320-2495 


442 


100% 




unmasked SW 


unmasked GT 


Red. Sea .View i 






o 


AAC3 1952.1 ! 
drill throuah \ 
TooSO Blast Hits I A AC3 1952.1 


integrin subunit alpha 1 0 precursor 


Homo 
sapiens 


PRI 


47.6%. 47% 


4-192, 
2-192 


166-354, 
164-354 


428 


100% 


1 


unmasked SW 


unmasked GT 


Red. Sea .View I 






1 AAA59544.1 ! 
! ! drill throuah 


Not given 


Homo 
sapiens 


PRI 


26.8%. 30% 


5-192, 
5-192 


150-332, 
150-332 


421 


100% 


1 


a 


Too50 Blast Hits 




unmasked SW 


unmasked GT 


Red. Sea .View 








AAB24821.1 
drill throuah 
TooSO Blast Hit3 


AAB24821.1 


leukocyte integrin alpha chain 


Homo 
sapiens 


PRI 


28.8%. 30% 
unmaskedSW 


r 

5-192, 
5-192 


150-332, 
150-332 


421 


! 100% 

iunmaskedGT 


1 


Red. Sea .View 








a 


AAB38702.1 
drill throuah 
Too50 Blast Hits 


AAB38702.1 


cartilage matrix protein 


Homo 
sapiens 


PRI 


27.3%. 28% 
unmaskedSW 


1-192, 
1 -190 


271-451, 
37-218 


421 


I 100% 
IunmaskedGT 


1 


Red. Sea. View 








a 


AAC0 1506.1 
drill throuqh 
TodSO Blast Hits 


AAC01506.1 


type XII collagen 


Homo 
sapiens 


PRI 


28.9%. 29% 
unmaskedSW 


1-192, 
1 -192 


134-317, 
134-317 


: 412 


I 100% 
IunmaskedGT 


1 


Red. Sea. View 






1 
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Aligned annotation view for CAA 57766.1 (downloading imag*...) 



50 1 00 1 50 2 00 25 0 300 350 400 450 500 550 BOO 650 700 750 



Rep: 



Primary database information: 

Secondary database information: ^^^^^g^^^^^jg^J 
Inpharmatica calculated information: (fcKiSS^SSE^^feS T^a\ 



Sequence Information 

SmirceTi 
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Pfam 

Protein families database of alignments and HMMs 

Home I Kewond search I Protein search I DNA search I Browse Pfem I Taxonomy search I Help 



^^^^^B^lj^^j^alfd^pittp : /Aw. Sanger, ac. uk/cgi-bin/Pf am/nph- search, cgi 



Results for gi|608720|emb|CAA57766.1| 

Trusted matches - domains scoring higher than the gathering threshold 



jDomain jStart jEnd j Bits jEvahie jAhgnment | 

lVP4 fl 1775 " psOolO 



[775 residues] 



VP4 1-775 



Alignments of Pfam-A domains to HMMs 



Format for fetching alignments to seedf^^^^^^ 

4s 




Alignment of VP4 vs gil608720lemblCAA57766.il/l-775 



gi|608720| 
gi| 608720] 
gi|608720| 
gi| 6087201 
gi| 608720| 
gi| 608720| 
gi| 608720| 
gi| 608720J 
gi| 608720| 
gi| 608720| 



* - >M AS liYRQLL snS Y tv dls de indigs ek tqnG vtvnp GP f aqtgp a 
MASliYRQLL+nSY+Vdl dei++igsektqn vt+npGPf aqt+pa 
1 MASLIYRQLLTHSYSVDLYDEIEQIGSEKTQN-VTTNPGPFAQTRYA 46 

P vk wghge tnds ttvcp vldGPV tntiiqp ds f ndpvqpwmL lxip tndG v 
Pv+wghge+ndsttvepvldGPY qp++f++p++pw L+++++dGv 

47 PVDWGHGEINDSTTVEP VLDGPV QPTTFKPPNDYWLLISSNTDGV 91 

V ve gTnns drwiap vlvePnvtqts r qY tlf gqnk qisV dbnts dtkwkFv 
V+e+Tnnsd+w+A+++vcP+v+qt+rqV+l£g+rikq++V+i\+s +kwkF+ 
92 VVE STNH SDFWT AVIAVEPHV S QTHRQYVLF GENKQFNVEHN S -DKWKFF 140 

e vik ttldgnpvs rs tL Is dnk lagimkk gvgr lp gp rge tpnattgyp t 
e++ j c + + ++++ r+ tL+ s+n+ 1 g++k+g gr+++++ge tp+att++++ 
141 EMFK.G S S Q SDF SNRRTLT SNNRL V GMLKV G - GRVWTFHGETPRATTD S SH 189 

vsne f pniqvninvnf Y iip r s qe sk C t e Y ixtngLPp iQn t rnivp vs is 
+++ ++ni+++ i+++ f Y iip rs qe sk C+ e Y inngLPp iQn tm+vp+ s+ s 
190 T AD -LHNI SIIIHSEFYIIPRS QE SKXHE YIHN GLPPI QHTRNVVPL SL S 238 

s rslp e . r aqpN eD ivi SKtSLWKE vqYnRDI iirf kF axis iiKs G G 1GY 
s r SI+++ r aq+N eD i+I SKtSLWK£+ qYnRDI iirf kF+ns+ iK+ G G lGY 
239 SR5I QY rRAQVNEDITI SKT SLWKEM QYHRDIIIRFKF GH S VIKL G GL G Y 288 

kws ve iSFKaanY qY tY tRD G e eVnAHTT c SVN gvndf spN gG sLPATDF 
kws eiS+KaanYqY+Y RDGe+V+AHTTcSVNgvn+f spHgGsLP TDF 

289 KVS -EI SYKAAHY QY SY SRD GE QVT AHTT C SVN GVHHF 5 YH G G SLP _ TDF 336 

nis rpeVik enspvYvdYWDD S qaF rNM V Y VRsL aAns rlngvlc tgG sY 
+ isrpevik+ pvY+dYWDDS+aFrNMVYVRsLaAn ln+v+c+gGsY 

337 SI SRYEVIK.GIHY VYID YWDD SKAFRHM VYVRSLAAN — LN SVKCV G G SY 3 84 

sF aLHYIP alAe V rNKgGKVnwP vms GGaV qLhs agvTL S tqFTDF V SLN 
+F+L P V G wP+m+GGaV+Lh+agvTLStqFTDFVSLN 

385 DFRL P V G - -EWPIMH G GAV SLHFAGVTL ST QFTDFV SLH 421 

S 1RF rddF r lav dNk JLeP f F s iir trvtrS G G dlY rqG IP AanPnGY dde 
SlRFr F+l+Vd eP+Fsiirtr+ + lY G IP AanPn+ ++e 
4 22 SLRFR — F SLTVD EPSFSIIRTRTMN LY — GLPAAHPNN-GNE 459 

ypE iaG rf S II SL VP SNDDY QTPI aHY SvTVRqDLERqlxidLRe CFN SL s 
p pE++ G r£ S H SL VP+ DDY QTPI+H SvTVRqDLERqlndLReeFHsLs 





□ 1: GI "608720 " [GenPept] 



VP4 [Human rotavirus] 



BLinfc, PubMed, Related Sequences, Nucleotide, Taxonomy 



locus 

definition 
accession 

PID 

VERSION 
DB SOURCE 
KEYWORDS 
SOURCE 

ORGANISM 

REFERENCE 
AUTHORS 
TITLE 

JOURNAL 
MEDLINE 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

FEATURES 

source 



CAA57766 7?5 aa 

VP4 [Human rotavirus]. 
CAA57766 
g608720 

CAA57766.1 GI:608720 

embl locus HRVP4, accession X82323 . 



06-AUG-1996 



1 



Indian institute of Science, 
Bangalore 560 012, INDIA 



ORIGIN 



// 



Human rotavirus. 
Human rotavirus 

viruses- dsRNA viruses; Reoviridae; Rotavirus. 

1 (residues 1 to 775) 

Mahaj an, N. P. and Rao, CD. ' . . 

Nucleotide sequence and expression in E. colx of the complete P4 
type VP4 from a G2 serotype human rotavirus 
Arch. Virol. 141 <2>, 315-329 (1996) 
96195253 

2 (residues 1 to 775) 
Mahajan,N.P. 
Direct Submission 

submitted (25-OCT-1994) N.P. Mahaj an 
Dept of Microbiology & Cell Biology, 
L o c ation/ Qualif ie r s 
1. . 775 

/organism=" Human rotavirus" 
/strain="IS-2 M 
/isolate="G serotype 2" 
/db_xref ="taxon: 10341" 

Protein 1: - 775 

/product="VP4 " 
CDS 1..775 

/gene="4 

/db x r e f = " SPTREMBL : Q 8 2 1 1 8 " 
/coded_by = "608719: 10. . 2337" 

mas liy roll tnsysvdlyd eieqigsekt qnvtinpgpf aqtryapvdw ghgeindstt 
vepvldgpyq pttfkppndy wllissntdg wyestnnsd fvtaviavep hvsqtnrqyv 
lfgenkqfnv ennsdkvkff emfkgssqsd fsnrrtltsn nrlvgmUcyg grwtfhget 
prattdssnt adlnnisiii hsefyiiprs qeskcneyin nglppiqntr nwplslssr 
siqyrraqvn editisktsl wkemqynrdi iirfkfgnsv iklgglgykw seisykaany 
qysysrdgeq vtahttcsvn gvnnfsyngg slptdfsisr yevikgihyv yidywddska 
frnmvyvrsl aanlnsvkcv ggsydfrlpv gewpimngga vslhfagvtl stqftdfvsl 
nslrfrfslt vdepsfsiir trtmnlyglp aanpnngney yevsgrfsli slvptiddyq 
tpimnsvtvr qdlerqlndl reefnslsqe iamsqlidla llpldmfsmf sgikstidlt 
ksmatsvmkk frksklatsi semtnslsda assasrsvsi rsnfstisnv sdasksvlnv 
tdsvndvstq tsiiskklrl remitqtegi ' sf ddisaavl ktkidmstqi gkntlpdwt 
edsekfipkr syrvlkdnev meintegkff aykvdtlnei pfdinkfael vtdspvisai 
idfkrlknln dnygitriea lnliksnpnv lrnfinqnnp iirnrieqli Iqckl 



1 

61 
121 
181 
241 
301 
361 
421 
481 
541 
601 
661 
721 



Restrictions on Use I Write to the HelpDesk 
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Target Mining 
Interface 







inlfeifiatica 





lULM ?and chain (e.g. B): 



• Enter PDB accession number (e.g. 1 QMA): 
OR 

• Enter one Swiss- Prot accession (e.g. P27504) or Gen Bank protein ID (e.g. CAB08761.1): 




Iteration niter: PS I- BLAST matches to be excluded: 




If you select e.g. "Matches detected during the first 3 iterations" these matches will be excluded from the report 
(using the first_PB_iter annotation). This allows you to focus on more remote homologous which have been 
detected after 4 or more PSI-BLAST iterations. Matches detected using PSI-BLAST with negative iterations or 
using Genome-Threader are not effected by this option. However, if one match is found during the first e.g. 3 
PSI-BLAST iterations and by Genome-Threader it will be excluded. 



m 

m 



Filter for the following SPECIES 




THIS PAGE BLANKS 



f ^ 




2) 81 additional hits identified by both, Genome Threader and PSI-BLAST: 

Combined Genome Threader and PSI - Blait output : PSI - BLAST values are f houm in maroon! 



Add2listi BPD links I WWWIink 



%ID 
(GT,PSI) 



Query 
rgn. 
(GT,PS1) 



Target rgn. 
(GT,PS!) 



Aln. 

score Conf . (GT) 
(GT) 



I ^AB24621.1 
I drill through 
I TopSOBIastHits 



j AAA59544.1 
! drill through 
□ . rTopSO Blast Hits 



| Red .Seq .View 



i Red. Seq. View 



AAAS9544.1 



AAE24821.1 



Not given 



leukocyte integrin alpha chain 



Homo 
sapiens 



PRI 



unmasked SW 



Homo 
sapiens 



PRI 



unmasked SW 



100%. 100^ 



100%, 100% 



1-187, 

1-187 



1-187, 
1-187 



148-334, 
148-334 



j 488 



100% 
unmasked GT 



148-334, 
148-334 



488 



100% 
unmasked GT 



Q99715 
I drill through 
I TopSO Blast Hits 



Q99715 



\ Red .Sea .View 



AAB3S702.1 
drill through 
Tqp50 Blast Hits 



AAB38702.1 



Red .Seq .View 



AAC0 1506.1 
drill through 
TopSOBIastHits 



AAC01506.1 



Red .Seq .View 



I CAA72402.1 
\ drill through 
hopSO Blast Hits 



CAA72402.1 



i Red .Seq .View 



AAB38547.1 
drill through 
l Top50B)astHits 



AAB38547.1 



Red. Seq .View 



CAB7 1222.1 
drill through 
Top50 Blast Hits 



CAB71222.1 



Red .Seq. View 



CAA07569.1 
drill through 
Top50 Blast Hits 



COLLAGEN ALPHA 1 (XII) 
CHAIN PRECURSOR. 



Homo 
sapiens 
(Human). 



PRI 



28.9%. 28% 



unmasked SW 



2-186, 
2-179 



439-617, 
2322-2494 



456 



100% \ 
unmaskedGT j 



cartilage matrix protein 



Homo 
sapiens 



PRI 



28.9%. 25% 
unmasked SW 



2-186, 
2-186 



274-452, 
40-221 



. 100% | 
•unmaskedGT = 



type XII collagen 



Homo 
sapiens 



PRI 



28.4%. 28% 



unmasked SW 



2-186, 
2-186 



137-318, 
137-318 



445 



100% 
unmaskedGT 



collagen type XIV 



leukointegrin alpha d chain 



dJ238D15.1 (collagen, type XII, 
alpha 1 ) 



Homo 
sapiens 



Homo 
sapiens 



Homo 
sapiens 




PRI 



28.7%. 30% 
unmasked S W 



PRI 



61%. 60% 
unmasked SW 



PRI 



27.1%. 22% 



unmasked SW 



2-186, 
2-186 



6-185, 
6-185 



442 



100% 
unmaskedGT 



1-187, 
1-187 



148-334, 
148-334 



439 



1- 186, 

2- 186 



293-472, 
1430-1620 



439 



100% 
lunmaskedGT 



100% 
unmaskedGT 
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^^^^^PfiftVt atnn" Attp : //vaterloo/inpharmatica/BPD3/cgi-bin/r3p. pl/?&rep_esn=620337&pa33^ord=c 



Aligned annotation view far P 10 155 (downloading image...) 



m 




1.AAA35433.1 



Primary database information: 
Secondary database information: 
Inpharmatica calculated information: 
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Pfam 

Protein families database of alignments and HMMs 

Keword sewch I Protein search I DMA search 1 Browse Pfan I Taxonomy search I Help 



Results for gi|133251|sp|P10155|RO60_HUMAN 

There were no matches to Pfam- A (including borderline matches) for gill33251lsplP10155IRO60_HUMAN 

Matches to Pfam-B 




Domain jStart pEnd j Evahie 


Alignment 


ISa^B 8344 ?1 =194 i2.3e-i03 


Align 
Align 


iPfam-B 10162 1195 1538 jl.8e-165 



[533 residues] 



Alignments of Pfam-B domains to best-matching Pfam-B sequence 

I 



Format for fetching alignments to Pfam-B families 




Query gi!133251 lsplP1015SIRO60 JHUMAN/1 -1 94 .matchingPf am-B_8344 



Q92787 1 MEESVHQMQPLNEKQIAHSQDGYVWQVTDMHRLHRFLCFGSEGGTYYIKE 50 
MEE S VH QM QPLNEKQIAH S QD G YVWQVTDMHRLHRFL CFGSEG GTYYIKE 
gi| 133251| sp|Pl0155|R060_KUMAH 1 MEE S VN QM QPLHEKQI AH S QD GYVWQVTDMHRLHRFL CF G SE G GTYYIKE 50 

Q92787 51 QKL GLEHAEALIRLIED GRG CEVI QEIKSF S QE GRTTKQEPMLF ALAI C S 100 
QKL GLEHAEALIRLIED GRG CEVI QEIKSF S QE GRTTKQEPMLF ALAI C S 
gi| 133251| sp|Pl0155|R0G0_HUMAH 51 QKL GLEHAEALIRLIED GRG CEVI QEIKSF SQE GRTTKQEPMLF ALAI CS 100 

Q92787 101 Q C SDI STKQ AAFKAV SEV CRIPTHLFTFI QFKKDLKE SMKC GMWGRALRK 150 
Q C SDI STKQ AAFKAV SEV CRIPTHLFTFI QFKKDLKE SMKC GMWGRALRK 
gi| 133251| sp| P10155 J R060_HUMAH 101 QC SDI STKQAAFKAV SEV CRIPTHLFTFI QFKKDLKE SMKC GMWGRALRK 150 

Q92787 151 AIADWYHEKG GM ALAL AVTKYKQRH GWSHKDLLRL SHLKP S SE G 194 
AIADWYHEKG GM ALAL AVTKYKQRH GWSHKDLLRL SHLKP S SE G 
gi| 133251| sp|Pl0155|R060_HUMAH 151 AIADWYHEKG GM ALAL AVTKYKQRH GWSHKDLLRL SHLKP SSEG 194 




Query gill33251lsplP10155lRO60_HUMAN/195-538 matching Pfam-B 10162 



008848 195 LAIVTKYITKGWKEVHEEYKEKAL SVEAEKLLKYLEAVEKVKRTKDDLEV 244 
LAIVTKYITKGWKEVHE YKEKALSVE EKLLKYLEAVEKVKRTKD+LEV 
gi| 133251| sp|Pl0155|R060_HUMAH 195 LAIVTKYITKGWKEVHELYKEKALSVETEKLLKYLEAVEKVKRTKDELEV 244 

008848 245 I HLIEEHQLVREHLLTHHLKSKEVWKALL QEMPLT ALLRHL GKMT AH SVL 294 
IHLIEEH+L VREHLLTNHLKSKEVWKALL QEMPLT ALLRHL GKMTAH S VL 
gi| 133251| sp|P10155|ROGO_HUMAH 245 IHLIEEHRLVREHLLTHHLKSKEVWKALL QEMPLT ALLRHL GKMT AH SVL 294 

O08848 295 EP GH SEV SLI CEKL SHEKLLKKARIHPFKVLI ALETYRAGHGLRGKLKWI 344 
EPGH SEV SL+ CEKL NEKLLKKARIHPFH+LIALETY+ GHGLRGKLKW 
gi 11332511 sp|Pl0155|R0 6 0_HUM AH 295 EP GH SEV SL V CEKL CHEKLLKKARIHPFHILI ALETYKT GHGLRGKLKWR 344 

O08848 345 PDKDIL QALDAAFYTTFKTVEPT GKRFLL AVD V SASMH QRAL G SVLHAST 3 94 



M 
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loMffi^m^^^Pnifoittp : //www. ncbi . nljn. nih. gov : 8 0 /en tr e z /que r y f c gi ? cmd=Re tr i eve&db=Protfe | 

s — ^-^^^^^^^^Si^fc ^— — ' ' ~~ ry 1 



LOCUS 
DEFINITION 

ACCESSION 
PID 

VERSION 
DB SOURCE 



RO60 HUMAN 538 aa PRI 01-FEB-1996 

60 KD RO PROTEIN (SO KD RIB ONUCLE OPROTEIN RO) (RORNP) (SJOGREN 

SYNDROME TYPE A ANTIGEN (SS-A>). 

P10155 

gl33251 

P10155 61:133251 

swissprot: locus R060_HUMAN, accession P1015S; 
class: standard, 
created: Mar 1, 1989. 
sequence updated: Mar 1, 1989. 
annotation updated: Feb 1, 1996. 
xrefs: gi: gi: 177782, gi: gi: 177783, gi: 
337657. gi: gi: 36722. gi: gi: 107626 



gi: 387656. gi 
MIM 234700 



gi: 



KEYWORDS 



SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 
MEDLINE 
REMARK. 

REFERENCE 
AUTHORS 
TITLE 

JOURNAL 
MEDLINE 
REMARK 
COMMENT 



FEATURES 

source 



protein 
Region 



Region 



Region 



Region 



xrefs' (non-sequence databases): MIM 600063. MIM 234700. PRO SITE 

Ribonucleoprotein; RN A-b inding; Systemic lupus erythematosus; 

Antigen. 

human. 

Homo sapiens 

Eukaryota; Metazoa; Chordata; craniata; vertebrata; Mammalia; 
Eutheria; 'primates; Catarrhini; Hominidae; Homo. 

1 (residues 1 to 538) 

Deutscher, S.L., Harley,J.B. and Keene, J .D. 

Molecular analysis of the 60-kDa human ro ribonucleoprotein 
Proc Natl. Acad. Sci. U.S.A. 85 (24), 9479-9483 (1988) 
89071722 

SEQUENCE FROM N.A 

2 (residues 1 to 538) 

Ben-Che trit,E., Gandy,B.J., Tan, E . M . and Sullivan, K F. 

isolation and characterization of a cDNA clone encoding the 60-xD 

component of the human ss-A/Ro ribonucleoprotein auto antigen 

j. Clin, invest. 83 (4), 1284-1292 (1989) 

89138084 

SEQUENCE FROM N.A. 

This SWISS-PROT entry is copyright. It is produced through a 
collaboration between the Svxss institute of Bio informatics and 
the EMBL outstation - the European Bio informatics -Institute. ; 
The original entry is available from http://www.expasy.ch/sprot 
and http:// www. ebi. ac.uk/sprot 

[FUNCTION] UNKNOWN . 

SUBUNITl R0 SMALL RIBONUCLEOPR0TEINS CONSIST OF FOUR SMALL RNA 
MOLECULES OF 85-112 NT, EACH OF WHICH IS COMPLEXED WITH A 60 KD 
PROTEIN RO RNPS MAY ALSO CONTAIN AN ADDITIONAL 52 KD PROTEIN, 
f SUBCELLULAR LOCATION] CYTOPLASMIC. 

DISEASE] SERA FROM PATIENTS WITH SYSTEMIC LUPUS ERYTHEMATOSUS 
OFTEN CONTAIN ANTIBODIES THAT REACT WITH THE NORMAL CELLULAR RO 
PROTEIN AS IF THESE ANTIGEN WAS FOREIGN, 
f SIMILARITY] CONTAINS 1 RNA RECOGNITION MOTIF (RNP) . 
[SIMILARITY] STRONG , TO XEN0PUS 60 KD R0 PROTEIN. 
L o c ation/ Qualif ie r s 
1. . 538 

/organism="Homo sapiens" 
/db_xref =" taxon: 9606" 
1. . 538 
1. . 538 

/product="60 KD RO PROTEIN" 
93.-98 

/region name =" Domain" 

/ no te = " RN A-BIHDIN G (RHP2) (BY SIMILARITY)." 
124. .131 

/ re gionjname = "D omain" 

/no t e="RN A-BIHDIN G (RNPl) (BY SIMILARITY)." 
239 

/ re gion_name = " C onf lie t" 
/note="K -> R (IN REF. 2)." 
515.. 538 

/ re gionjname =" Conflict" 

/note=" GMLDMCGFDTGALDVIRNFTLDMI -> ALQNTLLNKSF (IN REF. 



ORIGIN 



// 



2>- 

1 meesvnqmoj) Jnekqiansq dgywqvtdm nrlhrflcfg seggtyyike qklglenaea 

61 lirliedgrg ceviqeixsf sqegrttxqe pmlfalaics qcsdistkqa afkavsevcr 

121 iothlftfiq fkkdlkesmk cgmwgralrk aiadwynekg gmalalavtk ykqmgwshk 

181 dllrlshlkp sseglaivtk yitxgwkevh elykekalsv etekllkyle avexvkrtkd 
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This Page is Inserted by IFW Indexing and Scanning 
Operations and is not part of the Official Record 



Defective images within this document are accurate representations of the original 
documents submitted by the applicant. 



IgJ FADED TEXT OR DRAWING 

BLURRED OR ILLEGIBLE TEXT OR DRAWING 

□ SKEWED/SLANTED IMAGES 

□ COLOR OR BLACK AND WHITE PHOTOGRAPHS 

□ GRAY SCALE DOCUMENTS 

LINES OR MARKS ON ORIGINAL DOCUMENT 

□ REFERENCE(S) OR EXHIBIT(S) SUBMITTED ARE POOR QUALITY 

□ OTHER: 

IMAGES ARE BEST AVAILABLE COPY. 
As rescanning these documents will not correct the image 
problems checked, please do not report these problems to 
the IFW Image Problem Mailbox. 



BEST AVAILABLE IMAGES 



Defects in the images' inclul0Mt>aEe not limited to the items checked: 



□ BLACK BORDERS 



□ IMAGE CUT OFF AT TOP, BOTTOM OR SDDES 




